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ABSTRACT 


Empirical  orthogonal  Function  (EOF)  analysis  is  used  to 
describe  the  synoptic  forcing  features  of  selected  northwestern 
Pacific  ocean  tropical  cyclones  from  1967  to  1976.  EOF  analy¬ 
sis  is  applied  to  the  geopotential  field  at  850.  700  and  500mb 
on  a  120  point  grid  with  5  degree  latitude  and  longitude 
spacing  that  is  centered  on  the  storm.  The  120  EOF  coeffi¬ 
cients  (for  each  level)  are  computed  for  a  sample  of  454 
cases  in  the  history  file.  The  coefficient  vectors  are  trun¬ 
cated  to  the  first  10  coefficients,  based  on  the  Monte  Carlo 
selection  criteria  of  Preisendorfer  and  Barnett.  These  coeffi¬ 
cients  describe  about  85%  of  the  variance  in  the  fields.  The 
synoptic  forcing  represented  by  the  EOF  coefficients  is  then 
used  as  a  predictor  in  a  regression  analysis  track  forecast 
scheme,  along  with  past  storm  movement  and  intensity  during 
the  past  36  hours.  The  EOF-based  regression  equations  are 
verified  over  an  independent  sample  of  50  storms,  and  the 
position  errors  compared  to  the  official  Joint  Typhoon  Warning 
Center  (JTWC)  forecast  errors.  The  EOF-based  regression  equa¬ 
tions  give,  on  the  average,  a  174  reduction  in  error  when 
compared  to  the  official  forecast  issued  by  JTWC.  over  the 
independent  sample,  the  500mb  equations  performed  better  than 
the  equations  of  the  other  two  levels. 
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I .  INTRODUCTION 


Tropical  storms  spawned  over  the  western  North  Pacific 
Ocean  genesis  region  have  great  impact  on  both  civilian  and 
military  populations;  accurate  movement  forecasts  are  critical 
to  reduce  their  impact  upon  these  communities.  The  Joint 
Typhoon  Warning  Center  (JTWC) ,  Guam,  Marianas  Islands,  issues 
the  official  forecast  (to  United  States  military  agencies) 
of  tropical  storm  movement  and  intensity  for  storms  generated 
in  this  region.  Using  current  forecast  techniques,  these 
official  forecasts  have  an  average  forecast  position  error  on 
the  order  of  120,  240  and  360  nautical  miles  for  24-,  48-,  and 
72-hour  forecasts  (Annual  Typhoon  Report,  JTWC,  1981) .  There 
is  potential  for  improvement. 

Present  forecast  techniques  for  tropical  storm  movement 
may  be  generally  categorized  as  being  either  statistical  (which 
includes  analog  techniques)  or  dynamical.  The  motivation 
driving  the  two  types  of  forecasts  differs  greatly.  Statisti¬ 
cal  forecasts  typically  use  regression  or  analog  methods  with 
all  available  historical  storms  having  archived  data  to  pro¬ 
duce  a  statistically  optimal  position  forecast.  Regression 
analysis  methods  assume  that  certain  variables  deterministically 
correlate  with  future  storm  displacement.  These  correlated 
variables  are  then  used  in  a  regression  analysis  to  produce  a 
forecast.  Leftwich  and  Neumann  (1977) ,  for  example,  use  a 
second  order  polynomial  regression  with  seven  primary  predictors 
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to  forecast  typhoon  movement.  The  seven  predictors  include 
Julian  date,  initial  latitude  and  longitude,  and  past  12- 
and  24-hour  zonal  and  meridional  movement.  Since  they  used 
polynomial  regression,  these  seven  primary  predictors  actually 
give  rise  to  35  predictors  when  :he  second  order  predictors 
are  formed.  Using  these  predictors,  Leftwich  and  Neumann 
were  able  to  account  for  65%  of  the  variation  in  the  zonal 
displacement  and  53%  of  the  variation  in  the  meridional  dis¬ 
placement  for  12  hours.  Over  a  72-hour  period,  the  amount  of 
explained  variance  became  progressively  smaller.  Analog  tech¬ 
niques  (e.g.,  Jarrell  and  Sommervell,  1970),  use  the  histori¬ 
cal  file  of  storms  to  identify  storms,  and  the  surrounding 
environmental  fields,  that  have  strong  similarities  to  the 
present  storm.  Then,  a  weighted  similarity  index  of  certain 
variables  is  used  to  select  those  storms  in  the  history  file 
that  are  most  similar  to  the  present  storm.  A  weighted  aver¬ 
age  of  the  selected  storm  tracks  is  the  basis  of  the  forecast 
movement  of  the  present  storm.  The  justification  for  using 
this  technique  is  that  a  storm  with  similar  location  and 
surrounding  fields  should  also  have  a  similar  track.  Jarrell 
and  Sommervell  (1970)  present  an  analog  scheme  which  is  the 
original  version  of  the  scheme  used  presently  at  JTWC. 

In  contrast  to  the  statistical  methods,  dynamic  forecast 
techniques  assume  that  the  motion  of  the  storm  may  be  fore¬ 
cast  directly  from  numerical  integration  of  geophysical 
governing  equations  (momentum,  continuity  and  thermodynamic 
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equations,  for  example) .  Harrison  (1973)  presents  a  simple 
nested  grid  model  to  forecast  typhoon  movement  using  the  primi¬ 
tive  equations.  This  is  the  original  version  of  the  opera¬ 
tional  nested  tropical  cyclone  model  available  at  JTWC 
(Harrison,  1981). 

Both  statistical  and  dynamical  forecast  methods  have  weak¬ 
nesses.  The  statistical  methods  have  two  primary  problems; 
first,  since  they  are  based  on  historical  data  cases,  any 
storm  that  has  an  unusual  motion  is  not  likely  to  be  forecast 
well.  Additionally,  the  use  of  statistical  methods  tends  to 
homogenize  (smooth)  the  forecast.  Forecasts  using  a  blend  of 
similar  past  history  storms  are  typically  insensitive  to 
subtle  differences  in  the  synoptic  (dynamic)  forcing  fields. 
Thus,  purely  statistical  methods  have  deficiencies  in  fore¬ 
casting  the  unusual  case  and  inability  to  distinguish  subtle 
differences  in  the  synoptic-scale  fields. 

Dynamic  forecasts,  on  the  other  hands,  have  limitations 
in  both  theory  and  cost.  Due  to  the  smallness  of  the  coriolis 
parameter  in  tropical  regions,  a  geostrophic  relationship  is 
not  feasible.  This  makes  initialization  of  fields  difficult 
and  increases  the  probability  that  any  erroneous  data  points 
will  deteriorate  the  numerical  forecast  rapidly.  Convective 
heating  is  a  driving  mechanism  for  development  of  tropical 
storms,  rather  than  baroclinic  instability  as  in  the  mid¬ 
latitudes.  Unfortunately,  convective  heating  is  v^ry  difficult 
to  model  (Haltiner  and  Williams,  1980) .  Therefore,  the  govern¬ 
ing  equations  are  suspect  in  the  tropics,  due  to  poor 
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initialization  and  modeling  of  convective  heating.  An  even 
greater  problem  is  that  interaction  between  different  scales 
of  motion  is  critical  to  maintain  an  energy  balance  in  the 
tropical  cyclone.  If  the  grid  spacing  is  not  small  enough, 
the  energy  balance  will  be  altered,  and  possibly  give  spurious 
solutions.  For  this  reason,  the  grid  must  have  very  fine 
resolution  to  simulate  numerically  this  interaction.  The  cost 
of  numerical  integration  on  a  fine  grid  can  be  very  large  due 
to  the  Courant-Fredrichs-Levy  (CFL)  condition  which  requires 
smaller  integration  time  steps  as  the  grid  spacing  decreases 
(Haltiner  and  Williams,  1980) .  An  additional  problem  with  a 
fine  grid  model  is  that  there  are  generally  inadequate  wind 
and  mass  observations  to  initialize  the  numerical  model  in  the 
tropics,  and  this  problem  is  increased  as  the  grid  size  is 
reduced . 

With  the  difficulties  in  both  types  of  forecasting  methods, 
an  alternative  method  is  proposed  here.  This  study  will  em¬ 
ploy  Empirical  Orthogonal  Functions  (EOF's)  to  represnet 
numerically  the  large  scale  synoptic  (dynamic)  fields.  Then, 
these  functions  will  be  used  to  forecast  the  tropical  storm 
movement  using  regression  equations.  This  approach  is  novel 
for  forecasting  of  tropical  storm  movement,  in  the  sense  that 
previous  regression  analysis  methods  (Leftwich  and  Neumann, 
1977,  for  example)  have  not  incorporated  the  entire  synoptic 
forcing  field.  If  an  attempt  to  develop  a  simple  linear  re¬ 
gression  model  using  a  large  synoptic  field  is  made,  the  number 
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of  predictors  becomes  prohibitive,  as  each  grid  point  value 
relative  to  the  storm  would  be  a  predictor.  Early  analog  studies 
used  only  a  single  feature  from  the  synoptic  chart,  such  as 
the  700mb  trough  longitude  to  the  north  of  the  storm,  to  repre¬ 
sent  the  synoptic  field.  This  study  will  use  the  Empirical 
Orthogonal  Function  representation  of  the  entire  synoptic  forcing 
field  around  the  tropical  storm.  Therefore,  in  a  broad  sense, 
this  approach  may  be  thought  of  as  a  dynamically-based  statis¬ 
tical  forecast  scheme.  This  type  of  approach  is  not  totally 
without  precedence.  Lorenz  (1977)  states: 

In  an  informal  conversation  in  which  this  writer 
(Lorenz)  took  part  in  about  20  years  ago,  the 
question  arose  as  to  how  the  best  system  for  pro¬ 
ducing  the  operational  objective  24  h  prog  could 
be  developed,  if  the  system  had  to  be  ready  within 
one  year.  We  more  or  less  agreed  that  the  further 
improvements  in  numerical  weather  prediction  to  be 
expected  in  a  single  year  would  be  small,  and  that 
the  greatest  gains  would  come  from  an  empirical 
scheme  in  which  the  numerically  produced  prognostic 
charts,  or  "numerical  progs"  would  enter  as 
predictors. . . . 

Substitution  of  "improved  tropical  forecast  scheme"  for  "24  h 
prog"  in  the  quotation  gives  the  basis  and  purpose  of  this 
study . 

Empirical  Orthogonal  Function  analysis  allows  a  field  with 
many  grid  points  to  be  represented  by  a  linear  combination  of 
a  few  constant  vectors  and  variable  coefficients,  while  re¬ 
taining  a  large  portion  of  the  total  variation  (from  the  mean 
state)  in  the  field.  Thus,  a  synoptic  field  with  many  grid 
points  may  be  accurately  represented  by  only  a  few  variable 
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coefficients  (given  the  vectors  are  constant) ,  which  makes  the 
technique  ideal  to  use  with  regression  analysis.  For  example, 
Kutzbach  (1967)  was  able  to  represent  88%  of  the  total  varia¬ 
tion  in  average  January  temperatures  at  23  stations  (grid  points) 
in  North  America  over  a  25-year  period  by  using  only  five 
coefficients  and  constant  vectors.  That  is,  the  entire  sy’ .op¬ 
tic  scale  chart  of  mean  temperature  was  represented  by  a  23 
element  vector,  and  all  of  the  data  were  stored  in  25  indi¬ 
vidual  23-element  vectors.  Thus,  Kutzback  was  able  to  reduce 
the  number  of  vectors  needed  to  describe  the  January  tempera¬ 
ture  field  for  each  year  (at  the  23  locations)  from  25  to  5. 

The  Empirical  Orthogonal  Function  analysis  in  this  study 
is  used  for  data  reduction  and  representing  synoptic  fields 
numerically.  The  synoptic-scale  forcing  upon  the  tropical 
storm  may  be  represented  by  only  a  few  coefficients  obtained 
from  the  analysis.  These  coefficients  may  be  then  used  to 
forecast  statistically  the  tropical  storm  movement.  In  this 
manner,  the  synoptic  (dynamic)  forcing  is  incorporated  into 
the  statistical  forecasting  scheme.  Thus,  the  primary  pur¬ 
pose  of  this  study  is  to  investigate  the  role  of  the  synoptic 
forcing  and  to  forecast  tropical  storm  movement  from  this 
forcing. 
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II.  DATA  ACQUISITION  AND  FIELD  DEFINITION 


The  tropical  cyclone  tracks  and  height  data  used  in  this 
study  are  identical  to  those  used  by  Brown  (1981)  .  The  data 
required  for  an  individual  case  include  D-value  fields  at  850, 
700  and  500mb  and  the  storm  location  history  prior  to  and 
after  the  forecast  time.  A  relocatable  120-point  grid  is 
defined  with  5-degree  grid  spacing  in  both  longitude  and  lati¬ 
tude.  The  grid  covers  an  areal  extent  of  70  degrees  east  to 
west  and  35  degrees  north  to  south.  Individual  grid  points 
are  numbered  as  shown  in  Fig.  2-1.  The  grid  is  moved  each 
map  time  such  that  the  tropical  storm  is  always  located  at 
grid  point  70.  A  moveable  grid  can  create  difficulty  in  ob¬ 
taining  composite  variable  fields  due  to  the  longitude  con¬ 
vergence  as  the  storm  moves  further  north.  For  this  study, 
this  problem  is  assumed  to  be  of  minor  importance,  and  any 
composite  type  fields  are  computed  assuming  a  flat  earth.  It 
will  be  shown  below  that  this  assumption  is  not  too  bad  over 
the  domain  used  in  this  study, 

D-values  are  defined  (Husc. . ,  19  59)  as  height  deviations 

(in  meters)  from  the  standard  atmosphere  height  at  a  constant 
pressure  surface,  and  are  typically  positive  in  the  tropics. 

The  source  of  the  data  is  the  operational  Fleet  Numerical 
Oceanography  Center’s  (FNOC)  Northern  Hemisphere  (63  X  63) 
analyses  at  850,  700  and  500mb.  The  following  selection  condi¬ 
tions  are  required: 
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(1)  A  tropical  cyclone  of  at  least  tropical  storm  (35 
knots)  intensity  must  be  present  west  of  18Q°W; 

(2)  The  storm  must  persist  at  least  30  hours  with  tropical 
storm  intensity  or  greater,  as  analyzed  by  the  Joint  Typhoon 
Warning  Center  (JTWC) ,  Guam; 

(3)  The  storm  must  be  located  between  10°  and  25°N.  This 
requirement  was  included  to  insure  the  grid  did  not  extend 
into  the  Southern  Hemisphere,  and  was  not  comprised  of  pri¬ 
marily  mid- latitude  D-values.  Since  the  latitudinal  domain 
is  limited,  the  problem  of  longitude  convergence  is  not  a 
significant  problem  at  the  latitudes  of  the  domain.  The  dis¬ 
tance  from  the  western  edge  of  the  grid  to  the  storm  ranges 
from  1772  nautical  miles  at  10°N  to  1631  nautical  miles  at 
25°N,  to  1474  nautical  miles  at  358N  and  finally  to  1157 
nautical  miles  at  50 °N.  This  range  of  distance  is  considered 
insignf ic icant . 

(4)  Since  the  storm  position  is  coupled  with  the  upper 
level  analysis,  only  storms  existing  at  0000  GMT  and  1200 
GMT  are  considered; 

(5)  A  36-hour  separation  between  subsequent  positions  of 
the  same  storm  is  required  to  provide  a  pseudo- independence 
between  cases.  This  independence  is  a  critical  considera¬ 
tion  whenever  statistical  analysis  is  conducted. 

After  defining  the  selection  criteria  (1)  through  (5) , 
the  JTWC  Annual  Typhoon  reports  from  1967  to  1976  were  examined 
to  select  potential  cases.  These  particular  years  were  chosen 
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because  the  FNOC  Northern  Hemispheric  D-value  fields  were 
available  from  Systems  and  Applied  Sciences,  Monterey,  Cali¬ 
fornia,  during  these  years.  Examination  of  the  JTWC  reports 
yielded  560  potential  cases  meeting  the  criteria  above.  How¬ 
ever,  only  540  cases  had  th'  required  D-value  data.  Of  these 
54 C,  there  were  data  problems  with  an  additional  36  cases, 
leaving  504  valid  cases.  Archived  D-value  data  were  inter¬ 
polated  to  the  120-point  movable  grid  by  the  method  of  Bessel 
linear  interpolation  (Brown,  1981)  .  The  phrase  "base  time" 
will  be  used  to  define  the  time  of  the  initial  D-value  field, 
and  therefore  the  forecast.  The  storm  warning  position  from 
JTWC  is  used  as  the  location  at  the  base  time  and  at  all  times 
prior  to  the  base  time,  whereas  the  JTWC  best-track  position 
is  used  for  verification  positions.  This  is  a  significant 
difference  from  Brown  (1981) ,  who  used  only  the  best-track 
positions  for  all  historical  locations.  Warning  positions 
are  used  because  they  are  the  actual  locations  available  at  the 
time  of  forecast.  The  best-track  positions  are  calculated 
after  the  typhoon  season,  and  are  not  available  to  the  fore¬ 
caster  in  the  field.  Nevertheless,  they  are  assumed  to  be 
the  optimal  position  and  therefore  the  value  that  the  forecast 
scheme  tries  to  replicate. 

Storm  warning  positions  are  obtained  at  the  base  time  and 
12,  24  and  36  hours  prior  to  the  base  time.  Best  track  posi¬ 
tions  are  gathered  for  future  positions  in  6-hour  increments 
from  the  base  time  to  84  hours  in  the  future.  Therefore,  a 
storm  with  complete  history  has  continuously  available  locations 
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for  120  consecutive  hours.  The  set  of  three  levels  of  D-value 
fields,  four  warning  positions  and  15  best-track  positions 
comprise  the  entire  set  for  each  case.  The  number  of  cases 
having  X  available  prior  warning  positions  and  Y  future  best 
track  locations  available  is  shown  in  Table  2-1.  It  is  inter¬ 
esting  to  note  that  while  there  are  504  valid  cases  meeting 
criteria  (1)  through  (5) ,  only  401  cases  have  all  36-hours  of 
prior  warning  position.  Furthermore,  only  185  cases  have  loth 
36-hours  prior  warning  position  and  84  hour  future  best  track 
positions  available.  The  number  of  storms  with  36-hour  prior 
warning  position  available  increases  to  298  available  cases 
with  48-hour  future  best  track  location  and  401  storms  with 
30-hour  future  best  track  locations  at  tropical  storm  strength. 
The  number  of  cases  with  a  full  36-hour  history  is  important 
when  the  regression  equations  are  developed. 

The  composite  D-value  fields  at  500,  700  and  850mb  using 
all  504  cases  are  shown  in  Figs.  2-2,  2-4  and  2-6.  Of  inter¬ 
est  is  the  relatively  small  gradient  in  the  tropics  in  the 
500mb  composite.  This  level  has  relatively  little  indication 
of  a  tropical  disturbance  at  grid  point  70,  since  the  500mb 
level  is  near  the  level  at  which  the  surface  cyclone  becomes 
an  upper-level  anticyclone.  The  lower  level  (850  and  700mb) 
charts  show  fairly  strong  gradients  in  the  D-value  field  around 
point  70.  Figs.  2-3,  2-5  and  2-7  show  the  D-value  standard 
deviations  for  all  three  levels.  As  expected,  the  greatest 
D-value  variation  is  near  the  storm  location  and  in  the  mid¬ 
latitude  westerlies  to  the  north.  These  mean  and  standard 
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The  number  of  valid  cases  by  prior  JTVJC  warning  positions 
and  future  JTWC  best  track  position.  See  text  for  details. 
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Fig.  2-2.  The  mean  (composite)  D-value  field  at  500mb. 
Isopleths  are  deviation  in.  meters  from 
standard  atmosphere.  Storm  is  always  located 


Fig.  2-3.  The  composite  standard  deviation  D-value 
field  (In  meters)  at  500mb.  The  storm  is 
always  located  at  grid  point  70  (X) . 
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Fig  2-4.  Similar  to  Fig.  2-2,  except  for  700mb. 


Fig.  2-5.  Similar  to  Fig.  2-3,  exept  for  700mb. 
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deviation  fields  are  the  fields  used  in  normalizing  the  data 
for  each  case,  by  grid  point,  for  use  in  the  Empirical 
Orthogonal  Function  analysis.  The  504  cases  comprise  the 
data  set  from  which  the  Empirical  Orthogonal  Functions  will 
be  obtained. 


III.  EMPIRICAL  ORTHOGONAL  FUNCTIONS 

A.  BACKGROUND 

The  terminology  "Empirical  Orthogonal  Function"  (EOF)  was 
introduced  by  Lorenz  (1956)  .  Actually,  EOF  analysis  is  a 
variation  of  the  statistical  technique  of  principal  com¬ 
ponents,  and  was  introduced  in  it 3  current  form  by  Hotelling 
(1933) ,  and  was  based  on  an  idea  of  Pearson  (1901) .  Before 
delving  into  the  mechanics  of  EOF  analysis,  the  basic  concepts 
and  meaning  of  principal  components  will  be  presented  geo¬ 
metrically.  Geometric  meanings  presented  for  principal 
components  are  valid  for  EOF's,  since  EOF's  differ  from 
principal  components  only  by  a  scaling  factor. 

Principal  components  aid  in  explaining  interrelations  of 
individual  variables  acting  on  a  larger  field.  Morrison  (1967) 
presents  a  concise  geometric  interpretation  of  the  method. 
Principal  components  may  be  drawn  from  data  sets  in  any  num¬ 
ber  of  dimensions,  but  their  meaning  is  most  easily  seen  in 
three-dimensional  space.  Suppose  three  variables  (X^,X2fX2) 
form  a  trivariate  observation  space.  For  example,  X^,  X2,  and 
X^  could  be  the  500mb  D-value  at  gridpoints  1,  2  and  3  respec¬ 
tively.  A  large  collection  of  simultaneously  measured  values 
of  the  three  variables  could  be  plotted  as  in  Fig.  3-1.  The 
shaded  ellipsoid  in  the  figure  represents  the  scatter  plot  of 
many  observations  of  the  three  variables.  The  origin  of  the 
axis  is  the  mean  value  for  each  of  the  three  variables.  The 
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Fig.  3- 


1.  An  example  of  trivariate  principal 
components.  See  text  for  details 
(Morrison,  1967)  . 
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first  of  the  three  principal  components  (there  will  generally 
be  three  unique  principal  components  in  three  dimensions)  is 
the  major  axis  of  the  ellipsoid,  denoted  as  Y^  in  the  figure. 

In  other  words,  the  first  principal  component  is  the  axis  in 
space  that  explains  the  maximum  variation  from  the  origin  in 
the  three-dimensional  space.  For  this  reason,  the  term 
principal  axes  is  sometimes  used  instead  of  principal  com¬ 
ponents.  It  is  easily  seen  that  this  first  principal  component 
can  be  represented  by  a  vector  (and  the  vector  180  degrees  out 
of  phase)  originating  at  the  origin.  The  second  principal 
component  is  the  minor  axis  (Y^)  which  describes  the  maximum 
amount  of  variation  in  the  ellipsoid  that  is  not  explained  by 
the  first  component.  The  second  principal  component  is  also 
subject  to  the  constraint  that  it  be  orthogonal  to  the  first 
component.  This  is  identical  to  saying  the  second  principal 
component  is  the  largest  minor  axis  which  is  orthogonal  to 
the  major  axis.  The  third  principal  component  is  the  third 
minor  axis  (Y3)  which  explains  the  remainder  of  the  variation 
of  the  ellipsoid.  This  component  is  subject  to  the  constraint 
that  it  be  orthogonal  to  the  first  two  components  (axes) .  Thus 
the  three  principal  components  explain  the  total  variation  in 
the  observation  ellipsoid.  The  components  are  simply  orthogonal 
axes,  in  three  dimensions.  It  is  seen  from  this  simplified 
example  that  the  technique  may  be  easily  extended  to  applica¬ 
tion  in  multiple  dimensions.  If  the  axes  are  defined  by 
vectors,  it  is  straightforward  to  find  orthogonal  vectors  by 
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standard  methods.  This  orthogonality  constraint  simplifies 
identification  and  interpretation. 

In  M-dimension  space,  there  will  be  M  (or  occasionally 
fewer)  orthogonal  components,  which  are  simply  the  orthogonal 
vectors  in  M  space.  If  there  are  fewer  than  M  unique  com¬ 
ponents,  the  observation  variables  are  overdefined,  and  two 
or  more  of  the  describing  variables  are  perfectly  correlated. 

If  this  is  the  case,  one  of  these  perfectly  correlated  varia¬ 
bles  may  be  omitted  with  no  loss  of  information. 

As  mentioned,  Lorenz  (1956)  introduced  the  terminology 
"Empirical  Orthogonal  Function",  and  made  the  application  to 
the  atmospheric  sciences.  The  mathematical  method  used  for 
finding  the  orthogonal  components  or  vectors  involves  solution 
of  the  eigenvalue  problem  in  M  space.  EOF's  are  simply  princi¬ 
pal  components  that  have  not  been  scaled  by  the  square  root 
of  the  corresponding  eigenvalue  found  in  the  solution.  This 
subtle  difference  is  really  of  little  concern.  It  does  cause 
a  slight  modification  in  the  computations,  and  also  slightly 
changes  the  interpretation  of  the  results.  This  interpretation 
difference  arises  because  the  loadings  (elements)  of  the  solu¬ 
tion  eigenvector  (principal  component)  are  nothing  more  than 
the  correlation  of  the  variables  in  a  given  dimension  with  the 
principal  axis  it  defines  (Anderson,  1958) .  Mo  such  easy 
interpretation  of  the  loadings  is  possible  with  EOF's.  This 
modification  is  not  significant,  and  the  salient  points  and 
geometric  interpretation  valid  for  principal  components  are 
likewise  valid  in  EOF  analysis;  only  the  lengths  of  the 
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orthogonal  vectors  are  different.  The  mathematical  details 
will  be  covered  in  the  next  section. 

EOF  analysis  normally  has  been  used  in  two  primary  appli¬ 
cations  in  geophysical  sciences.  These  are  either  as  a  map¬ 
typing  tool,  or  as  a  tool  for  reducing  dimensionality  and 
explaining  the  variance  structure  of  a  large  field.  For 
example,  Stidd  (1967)  uses  EOF  analysis  to  describe  the  varia¬ 
tion  in  average  monthly  rainfall  in  Nevada.  In  this  paper, 

Stidd  states: 

eigenvectors  might  be  regarded  as  an  ultimate  develop¬ 
ment  in  the  use  of  orthogonal  functions  to  describe 

patterns  or  arrays  of  data. 

He  goes  on  to  show  that  annual  precipitaion  in  Nevada  may  be 
described  primarily  by  one  of  three  basic  "components".  The 
three  are:  (1)  a  winter  maximum  from  large  scale  storms; 

(2)  a  secondary  peak  during  the  summer  due  to  thunderstorms; 
and  (3)  a  small  effect  due  to  the  removal  and  inclusion  of 
water  into  the  hydrological  structure  due  to  snow  pack.  EOF 
analysis  allows  extraction  of  each  component  and  allows  the 
researcher  to  determine  the  primary  variables  driving  each  of 
the  components.  Additionally,  by  using  a  linear  combination 
of  the  eigenvectors  (components) ,  it  is  possible  to  determine 
and  estimate  the  rainfall  amount  in  *ata  sparse  and  non-observed 
regions.  This  estimation  is  done  by  interpolation  of  coeffi¬ 
cients  associated  with  each  eigenvector.  These  coefficients 
will  be  explained  more  fully  in  the  next  section.  Stidd  was 
able  to  explain  93%  of  the  total  variance  in  the  annual  rain¬ 
fall  in  Nevada  by  using  only  three  eigenvectors  and  coefficients. 


36 


This  is  compared  to  the  initial  estimation  which  required  12 
charts  (one  for  each  month) .  The  key  points  are  that  Stidd 
was  able  to  bo*h  isolate  the  causes  behind  annual  variation 
in  Nevada  rainfall  (over  all  locations  in  Nevada) ,  and  addi¬ 
tionally,  reduce  the  data  required  to  make  this  estimate  by 
75%  (from  12  charts  to  three) .  This  "gleaning  of  the  forcing 
pattern"  and  data  reduction  use  of  EOF's  has  been  used  fre¬ 
quently  in  meteorological  applications.  Other  examples  of 
EOF  use  in  this  manner  are  found  in  Rinne  and  Karhila  (1979)  , 
and  Craddock  and  Flood  (1969)  . 

Another  application  of  EOF  analysis  has  been  for  map  typ¬ 
ing.  Brown  (1981)  uses  EOF  analysis  to  divide  a  large  sample 
of  cases  into  smaller  discrete  subsets  by  map  typing  based  on 
the  coefficients  derived  from  EOF  analysis.  The  primary  objec¬ 
tive  was  to  use  the  subsets  of  similar  cases  to  form  analogue- 
type  forecasts  of  tropical  cyclone  tracks.  Accuracy  of  fore¬ 
casts  using  this  map  typing  scheme  is  generally  less  than  with 
other  objective  tropical  cyclone  motion  forecasting  techniques. 

B.  MECHANICS  OF  THE  EOF  METHOD 

The  mechanics  of  EOF  analysis  presented  here  follows  an 
elegant  treatment  by  Kutzbach  (1967) .  The  notation  used  in 
this  development  is  defined  as  follows;  a  single  underscored 
variable  in  lower  case  letters  is  a  vector  (e.g.,  e) ,  an 
uppercase  variable  with  two  underscores  is  a  matrix  (A) ,  and 
a  primed  vector  of  matrix  is  the  transpose  (e') .  The  raw 
data  field  (in  this  3tudy,  the  120  grid  point  fields  of 
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D-values)  is  formed  into  a  matrix,  A.  This  matrix  is  con¬ 
structed  so  that  each  column  consists  of  the  120  observed 
D-values  for  a  particular  data  case.  Each  row  represents  the 
D-values  at  the  same  grid  point  for  all  data  cases.  If  there 
are  N  separate  data  cases  (storms) ,  with  each  case  having  M 
grid  point  values,  A  is  an  M  X  N  matrix  representing  the 
observed  D-value  fields.  The  objective  of  EOF  analysis  is  to 
determine  the  single  vector  (e)  in  M  dimensions  that  best 
represents  all  of  the  N  observation  vectors.  This  is  equiva¬ 
lent  to  saying  that  one  wants  to  find  the  vector  (e)  that 
minimizes  the  summed  squared  error  of  all  observation  vectors 
compared  to  (e) .  Therefore,  EOF  analysis  may  be  thought  of 
broadly  as  a  multi-dimensional  extension  of  a  least  squares 
technique. 

The  matrix  A  may  be  constructed  in  one  of  three  ways: 
with  the  actual  data  values;  with  the  departure  from  mean 
data  values?  or  with  the  normalized  departure  from  mean  values. 
There  are  advantages  and  disadvantages  to  using  each  type  of 
initialization  for  the  data  matrix  A.  In  the  first  case,  the 
resultant  EOF's  will  have  magnitudes  on  the  order  of  the  actual 
data,  and  will  effectively  represent  the  actual  component 

field.  Morrison  (1967)  points  out  that  this  type  of  input* 

< 

matrix  may  be  dangerous  to  use  if  the  variables  in  the  differ¬ 
ent  dimensions  vary  widely  in  magnitude.  As  seen  in  the  mean 
and  standard  deviation  charts  of  the  fields  (Figs.  2-2  through 
2-7),  this  could  be  a  pr  blem  here,  since  the  D-values  are 
generally  quite  a  bit  lower  in  the  northern  portion  of  the  grid, 
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as  well  as  having  larger  variation  in  the  north.  There  are 
systematic  differences  in  magnitude  at  different  points  on 
the  grid  (dimensions) .  Thus,  the  grid  points  with  larger 
values  are  given  more  weight  chan  the  grid  points  with  smaller 
values,  and  some  of  the  meaning  of  the  resultant  eigenvectors 
is  lost.  For  this  reason,  this  type  of  input  data  was  not 
used.  A  second  potential  form  for  the  data  matrix  A  is  to 
have  the  elements  be  comprised  of  the  deviations  from  the  mean 
value  of  a  given  dimension  (row,  .  This  type  of  approach  is 
more  in  line  with  the  classical  principal  components  approach. 
In  this  case,  the  eigenvectors  are  extracted  from  the  covari¬ 
ance  matrix.  This  is  really  the  main  advantage  to  this  form, 
while  the  primary  disadvantages  are  that  the  interpretation 
of  the  resultant  eigenvectors  becomes  muddled  due  to  scaling 
of  the  dimensions  and  again,  there  is  not  equal  weight  between 
dimensions  if  their  magnitudes  differ .  The  third  choice  for 
the  input  data  matrix  form  is  to  use  normalized  departures 
from  the  mean.  This  has  a  disadvantage  in  that  it  may  smooth 
slightly  the  resultant  eigenvectors  (Kutzbach,  1967) .  This 
approach  was  selected  because  the  variations  in  all  dimensions 
are  equally  weighted  in  extracting  the  eigenvectors.  In  this 

study,  normalization  is  accomplished  by  subtracting  the  mean 

» 

value  at  that  grid  point  (over  all  cases),  and  then  dividing 
by  the  standard  deviation  of  that  grid  point  over  all  cases; 


^amn^T 


(a  -  a  )/s 
mn  m  '  am 


39 


where : 


(a  is  the  transformed  data  point 

mn  i 

a_  is  the  original  data  point  (D-value) 

mn 

a  is  the  mean  of  a  at  grid  point  m' (taken 

over  all  n  cases) 

s  is  the  standard  deviation  of  a  at  grid 

point  m  (over  all  n  cases) . 

Brown  (1931)  discusses  in  more  detail  various  methods  of 
normalization  transformations. 

After  obtaining  the  normalized  input  data  matrix  A  (over 
all  N  cases) ,  the  next  step  is  to  maximize  the  quantity 

(e'A)2N“1/e'e  ,  (1) 

—  mt  —  — 

(where,  unless  otherwise  noted,  any  product  of  two  vectors 
or  matrices  is  the  dot  (inner)  product)  under  the  constraint 
that 

e'e  =  1.  (2) 

Equation  (1)  is  the  squared  product  of  an  arbitrary  vector 
(e)  and  the  actual  data  vectors.  Constraint  (2)  is  made  simply 
to  normlaize  the  maximized  product.  This  maximization  of  (1) 
with  constraint  (2)  may  be  rewritten: 

Max(y:  e'e  *  1}  where  y  =»  Ce'A)2  N-1,  (3) 


or 


Max{y:  e'e  =  1}  where  y  =*  e'AA'  eN~^  ,  (4) 

Defining  R  *  A  A'  n”1,  equation  (4)  may  be  written  as 

Max{y:  e'e  =  1}  where  y  •  e*  Re  .  (5) 

It  is  of  interest  to  note  that  the  form  of  R  is  the  cross 

product  matrix  if  A  is  comprised  of  the  actual  data.  However, 
R  is  the  covariance  matrix,  or  the  correlation  matrix,  if  the 
input  matrix  A  has  elements  which  are  deviations  from  the 
mean  or  normalized  deviations  from  the  mean,  respectively. 
Premultiplying  both  sides  of  equation  (5)  by  e  results  in 

e  y  =  Re.  (6) 

Morrison  (1967)  shows  that  maximization  of  y  leads  to  the 
requirement  that  |R  -  ylj  *  0,  or  else  the  solution  is  trivial 
Maximization  of  (6),  therefore,  yields  the  eigenvalue  problem, 
where  y  is  the  eigenvalue. 

Equation  (6)  applies  to  maximization  of  one  eigenvector 
only.  Since  there  are  M  dimensions  in  the  original  problem, 
one  wishes  to  maximize  the  explained  variance  in  each  of  the 
dimensions.  Therefore,  it  is  convenient  to  rewrite  (6)  for 
all  vectors  in  the  M-space  as 
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(7) 


E  Y  *  RE. 

Here,  E  is  an  M  X  M  matrix,  rather  than  a  vector  as  was  the 
case  for  (6) .  It  turns  out  that  the  elements  of  Y  are  the 
eigenvalues  found  solving  |r  -  Ylj  *  0.  Each  column  of  E 
is  an  eigenvector  associated  with  a  single  eigenvalue  Y^. 

It  follows  from  the  definition  of  eigenvectors  that  they  are 
orthogonal  (uncorrelated) .  Again,  the  necessary  condition  in 
finding  E  is  that  E'E  »  I,  the  identity  matrix. 

Returning  to  the  basic  definition  of  R,  it  is  seen  by 
substitution  that 


!' A  A '  |  *  N  Y  .  (8) 

Morrison  (1967)  has  shown  that  the  eigenvector  associated 
with  the  largest  eigenvalue  (y^)  is  the  vector  that  explains 
the  maximum  variation  in  R.  In  fact,  the  first  eigenvector 
explains 


m 


(9) 


of  the  total  variation  in  R.  The  variance  unexplained  by  the 
first  (largest)  eigenvector  is  the  residual.  The  second 
eigenvector  is  associated  with  the  second  largest  eigenvalue, 
and  explains  the  maximum  variation  remaining  in  the  residual 
field,  and  is  given  by 
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(10) 


m 

V  £  yi  • 

*  i»l  1 


Therefore,  the  first  two  eigenvectors  together  explain 


in 


Vl  +  y2/Ji Yi 


an 


of  the  total  variation  in  R.  The  process  continues  with  each 
successive  eigenvector  describing  the  maximum  remaining  varia¬ 
tion  in  the  residual  field.  The  final  eigenvector  is  simply 
any  variation  in  the  total  mean  field  left  unexplained  by  the 
combination  of  all  previous  eigenvectors.  As  the  last  eigen¬ 
vector  explains  all  of  the  remaining  variation  in  the  field, 
the  total  variation  in  R  is  explained  by  all  of  the  eigenvectors. 

Any  of  the  original  fields  (cases)  may  be  obtained  by 
calculating  the  EOF  coefficients.  These  coefficients  (called 
multipliers  by  Stidd,  1967,  and  others)  are  also  orthogonal 
and  are  found  by  defining: 


C  *  E'A  , 


(12) 


'where  C  is  an  M  X  N  matrix.  The  nth  row  of  the  coefficient 
matrix  (C)  is  the  orthogonal  coefficient  vector  corresponding 
to  the  nth  case.  The  input  data  matrix  A  may  be  retrieved  by 


E  C  , 


(13) 
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which  exactly  replicates  each  data  case  in  A.  One  of  the 
primary  advantages  of  EOF  analysis  arises  from  the  fact  that 
the  first  few  eigenvectors  often  describe  a  large  portion  of 
the  total  variance  in  a  sample,  depending  on  the  structure 
and  correlation  in  the  field.  One  may  quite  accurately 
approximate  the  actual  field  by  retaining  only  the  largest 
few  eigenvectors.  Assuming  500  cases,  the  initial  data  matrix 
required  to  describe  the  synoptic  fields  is  a  120  X  500  matrix, 
which  has  60,000  elements.  Using  only  the  first  10  eigenvec¬ 
tors  and  orthogonal  coefficients,  the  original  fields  may  be 
represented  accurately  by  multiplication  of  two  matrices, 
the  first  a  120  X  10  matrix  of  truncated  eigenvectors,  and 
the  second  a  10  X  500  coefficient  matrix.  The  total  number  of 
elements  in  both  matrices  is  only  6,200.  Since  EOF  analysis 
allows  a  high  percentage  of  the  total  variation  to  be  explained 
by  only  the  largest  few  eigenvectors,  it  is  seen  that  the  data 
may  be  accurately  estimated  using  as  little  as  10%  of  the  total 
number  of  data  points. 

This  significant  reduction  of  dimensionality  makes  EOF's 
a  prime  tool  to  use  for  climatic  estimation,  and  has  been 
used  as  such  by  Horel  (1981),  Kidson  (1975),  Walsh  and  Mostek 
(1980)  and  Walsh  and  Richman  (1981)  among  others. 

All  N  observed  fields  are  represented  by  the  linear 
combination 

in  *  X  'lnii  "  -  1>2 . *•  <14> 
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where  a  is  the  nth  cases.  Thus  each  case  may  be  represented 
as  a  linear  combination  of  the  orthogonal  coefficients  and 
elements  of  the  eigenvectors.  The  first  k  eigenvectors 
(k  <<  m)  generally  represent  a  large  portion  of  the  total 
variance  in  a.  Keeping  only  the  largest  k  eigenvectors,  the 
actual  cases  may  be  very  closely  approximated  by: 

k 

a^  *  £  c.  e-  n  *  1,2, ...,N.  (15) 

If  one  retains  only  significant  eigenvectors,  maximum  infor¬ 
mation  may  be  retained  with  little  complicating  noise.  This 
leads  to  the  obvious  problem  regarding  the  optimal  number  of 
eigenvectors  to  keep. 

C.  SELECTING  THE  NUMBER  OF  EIGENVECTORS 

In  the  previous  section,  it  was  demonstrated  how  a  data 
field  may  be  represented  accurately  by  a  linear  combination 
of  only  a  small  number  of  eigenvectors  and  coefficients.  The 
question  of  how  many  eigenvectors  to  retain  is  vital.  Simply 
stated,  the  question  is  at  what  point  does  the  linear  combina¬ 
tion  no  longer  add  signal,  but  only  describe  noise  in  the  data. 
Unfortunately,  there  is  no  single  accepted  answer  to  this 
question.  Several  possibilities  are  presented  here. 

The  classical  principal  component  approach  is  outlined  by 
Morrison  (1967),  and  assumes  a  very  large,  normally-distributed 
sample  for  the  data.  In  this  case,  the  significant  eigenvectors 
may  be  identified  by  asymptotic  behavior  of  the  eigenvalues. 
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One  seeks  those  eigenvectors  that  are  significantly  different 
than  zero.  Anderson  (1963)  has  shown  that  sampling  problems 
using  normalized  data  are  much  more  complex  than  when  non¬ 
normal  ized  departures  from  means  are  used.  Therefore,  the 
initial  development  given  here  assumes  non-normal ized  data, 
because  the  mathematical  description  is  easier  to  follow.  When 
the  number  of  observations  is  very  large,  Anderson  (1963) 
shows  the  quantity  /n(£^-A^)  is  distributed  normally  about  a 
zero  mean,  with  variance  of  2A^.  Here  is  the  sample  popu  x- 
tion  eigenvalue,  is  the  total  population  eigenvalue,  and 
n  the  number  of  cases.  Further,  Anderson  shows  the  eigenvalues 
are  independent  of  each  other.  In  this  case,  one  may  use  a 
confidence  interval  approach  to  determine  if  the  eigenvalues 
are  significantly  different  than  zero.  If  an  eigenvalue  is 
not  significantly  different  than  zero,  the  associated  eigen¬ 
vector  describes  only  random  noise.  The  confidence  interval, 
given  by  Morrison  (1967)  is; 


1  +  *l/Jc/I7n 


1  -  zl/2c/57“ 


(15) 


where : 


zl/2  *-s  the  standard  two  tail  z  score  (z  *  1.96 
'  gives  a  95%  confidence  interval) 

The  asymptotic  decision  rule  is  simply  that  the  eigenvector  is 
discarded  unless  the  lower  limit  in  (15)  is  greater  than  zero. 
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While  this  method  is  sound  theoretically,  and  works  very 
well  for  large  data  sets,  Preisendorfer  and  Barnett  (1977) 
point  out  that  data  sets  used  in  meteorological  (and  oceano¬ 
graphic)  studies  are  rarely  of  the  size  for  which  asymptotic 
behavior  begins  to  emerge.  In  fact,  Preisendorfer  and  Barnett 
suggest  that  a  sample  size  on  the  order  of  1000  cases  may  be 
required  before  asyxnptoticity  applies.  Since  the  data  set 
used  in  this  study  is  much  below  this  size,  the  classical 
asymptotic  selection  approach  for  determining  how  many  eigen¬ 
vectors  to  retain  was  not  used. 

Another  approach  used  throughout  the  literature  (e.g., 

Rinne  and  Karhila,  1979)  involves  examination  of  the  natural 
logarithm  of  the  eigenvalue.  This  method  is  called  the  LEV 
(Logarithmic  Eigenvalue)  diagram  method.  The  basis  of  this 
method  is  that  the  eigenvectors  for  those  components  that 
describe  signal  have  a  different  structure  than  those  that 
describe  noise.  Furthermore,  it  has  been  noticed  that  the 
structure  change  is  most  easily  noted  when  natural  logarithms 
of  the  eigenvalues  are  examined.  To  use  the  method,  the  eigen¬ 
values  are  first  ordered,  from  largest  to  smallest.  This 
method  will  work  if  there  is  a  distinct  change  in  slope  of  the 
ordered  eigenvalues  at  some  point.  All  eigenvalues  larger 
than  this  slope  change  point  are  retained,  and  all  smaller  ones 
omitted.  While  this  method  apparently  does  well  in  some  cases, 
and  is  exceedingly  simple  to  use,  it  is  not  used  in  this  study 
for  several  reasons.  First,  it  is  not  at  all  clear  that  a 
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break  in  the  slope  of  the  eigenvalues  at  some  point  is  the 
demarcation  point  between  those  eigenvalues  that  describe 
signal  and  those  that  describe  noise.  Secondly,  even  assuming 
the  break  in  the  eigenvalue  slope  does  indeed  mark  the  point 
in  signal-to-noise  domination  shift,  the  method  is  scientif¬ 
ically  unsatisfying  because  there  is  little  statistical  jus¬ 
tification  for  its  use. 

Another  method  that  appears  in  the  literature  is  to  select 
the  number  of  eigenvalues  and  vectors  a  priori,  or  select  a 
percent  total  variance  explained  value  as  the  cutoff  point  a 
priori.  Richman  (1980)  presents  several  of  these  methods  in 
detail.  For  example,  Cattell  (1958)  recommends  retaining 
all  eigenvalues  necessary  to  explain  99%  of  the  total  variance. 
Guttman  (1954)  recommends  retention  of  all  eigenvectors  asso¬ 
ciated  with  eigenvalues  larger  than  1.  Both  of  these  methods 
in  effect  involve  probable  overf actoring .  That  is,  use  of 
these  methods  leads  to  keeping  more  eigenvectors  than  are 
actually  required  to  adequately  explain  the  data.  This  in 
and  of  itself  is  not  serious  unless  the  eigenvalues  and  vectors 
are  rotated  to  better  fit  the  clusters  in  space  (see  Richman, 
1981),  bur  it  does  tend  to  defeat  the  purpose  of  EOF  analysis. 

If  overfactoring  occurs,  one  does  not  receive  maximum  data 
reduction.  Since  the  purpose  of  this  study  was  to  reduce 
the  synoptic  sea;  5  forcing  fields  to  only  a  few  easily  separable 
components  to  aid  in  determining  typhoon  movement,  underfactor¬ 
ing  is  /ot  a  real  problem. 
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Richman  (1980)  used  a  novel  approach  to  determine  how  many 
eigenvectors  to  retain.  He  also  used  rotation  of  components, 
which  is  discussed  in  detail  in  the  last  section  of  this  chap¬ 
ter.  llis  criteria  was  defined  as  "meaningfulness".  That  is, 
if  the  component  had  apparent  meaning  (if  the  component  field 
was  interpretable  synoptically) ,  the  component  was  retained. 

It  has  been  demonstrated  (for  example,  Craddock  and  Flood, 

1969)  that  higher  order  eigenvectors  and  components  degenerate 
to  little  more  than  a  series  of  uncorrelated  high  and  low  value 
regions.  This  means  that  there  is  some  scientific  justifica¬ 
tion  to  Richman' s  method.  Nevertheless,  it  was  not  used  here 
because  it  is  entirely  subjective,  and  therefore  could  give 
inconsistent  results  when  used  by  different  researchers. 

Brown  (1981)  used  the  method  of  retaining  the  number  of 
components  that  explain  a  "reasonable  amount"  of  the  total 
variance.  Specifically,  using  the  same  grid  and  data  fields 
that  are  used  in  this  study,  he  carried  out  experiments  in 
map  typing  using  the  largest  10,  15  and  20  of  the  120  eigen¬ 
vectors.  This  selection  approach  is  rather  arbitrary,  since 
there  is  no  objective  way  of  distinguishing  what  the  eigen¬ 
vectors  are  representing  with  respect  to  the  signal-noise 
problem,  and  specifically,  if  any  signal  is  being  omitted. 

The  final  method,  which  is  used  in  this  study,  is  based  on 
a  selection  method  introduced  by  Preisendorfer  and  Barnett 
(1977) .  In  essence,  the  scheme  is  a  Monte  Carlo  approach  to 
determining  the  number  of  eigenvectors  to  keep.  It  is  not 


very  different  from  the  classic  asymptotic  appraoch  described 
by  Morrison  (1967) .  The  main  difference  is  that  it  is  assumed 
by  Preisendorfer  and  Barnett  that  not  enough  cases  are  avail* 
able  to  use  an  asymptotic  approach  with  geophysical  data  bases. 
One  key  assumption  is  that  the  true  (physical)  variables  are 
normally  distributed  at  all  individual  grid  points.  The  simu- 
lation  input  data  are  normally  distributed,  with  mean  zero, 
variance  one,  which  is  just  simulation  of  point  normalized 
data.  Given  these  constraints,  and  using  a  large  number  (N  >_  100 
is  recommended  by  Preisendorfer  and  Barnett  (1977) )  of  simula¬ 
tions,  one  can  create  sufficient  numbers  of  random  fields  to 
simulate  accurately  the  eigenvalues  that  result  if  the  process 
is  purely  random.  In  addition  to  calculating  the  mean  value 
of  the  simulated  eigenvalue,  the  standard  deviation  of  that 
eigenvalue  is  calculated  over  the  100  or  more  simulations.  If 
the  true  physical  eigenvalues  deviate  from  the  simulated  random 
field  eigenvalues  by  more  than  two  (three)  standard  deviations, 
one  is  95%  (99%)  confident  that  the  field  is  significantly 
different  from  a  field  that  is  purely  random.  In  other  words, 
if  deviation  is  by  more  than  two  standard  deviations,  one  is 
reasonably  assured  that  the  eigenvector  is  describing  signal 
rather  than  noise.  The  simulated  eigenvalues  obtained  in  this 
study  will  be  presented  in  the  next  chapter,  along  with  the 
eigenvalues  obtained  from  analysis  of  the  data.  In  using  this 
Monte  Carlo  method,  504  simulated  120  point  random  grids  were 
obtained.  The  eigenvalues  of  these  random  fields  were  found 
and  stored.  This  process  was  repeated  100  times  to  obtain  the 
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simulated  eigenvalues  and  standard  deviations  of  the  eigen¬ 
values.  These  were  then  compared  to  the  true  data  eigenvectors. 
One  caution  must  be  stated  concerning  use  of  this  method. 

Richman  (1980)  points  out  that  this  method  has  potential  to 
slightly  underfactor.  However,  this  is  not  of  primary  con¬ 
cern  here  since  the  potential  for  underfactoring  is  only  slight. 

D.  ROTATION  OF  VECTORS 

Rotation  methods  seek  to  rotate  the  eigenvectors  (axes) 
in  space  to  better  fit  data  clusters.  There  is  some  contro¬ 
versy  existing  (Richman,  1980)  as  to  whether  rotation  of  the 
resultant  components  (eigenvectors)  should  be  employed.  Many 
of  the  potential  schemes  have  been  surveyed  in  detail  by 
Richman  (1980) ,  who  describes  some  of  the  specific  strengths 
and  weaknesses  of  the  schemes. 

A  very  simple  example  of  rotation  follows.  Suppose  that 

two  distinct  data  clusters  are  positioned  (in  Cartesian  two- 

1  2 

dimensional  space)  at  [2]  and  l^] .  Following  the  method  out¬ 
lined  earlier  in  this  chapter,  the  eigenvalues  would  then  be 
4  5 

[  V ]  (for  non-normalized  input  data) .  The  eigenvectors  would 
be  [^]  and  [_^]  respectively.  It  is  noted  then  the  first 
eigenvector  (which  explains  90%  of  the  total  variance)  bisects 
the  two  data  clusters  in  space.  The  second  eigenvector  does 
not  really  fit  the  data  clusters.  Even  the  first  eigenvector 
does  not  give  a  true  representation  of  the  clusters  in  space. 
Misrepresentation  of  this  type  may  be  eased  by  use  of  rotation. 
The  two  broad  classes  of  rotation  that  are  employed  are  the 
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orthogonal  and  the  oblique.  Orthogonal  rotation  pivots  the 
eigenvectors  identically  so  as  to  maintain  the  orthogonal 
relationship.  It  is  seen  in  the  simplified  case  just  presented 
that  an  orthogonal  rotation  would  never  give  a  perfect  repre¬ 
sentation  of  the  input  clusters,  as  the  input  clusters  only 
have  a  45°  angle  between  them  in  the  two  dimensions,  and  are 
assumed  to  occur  with  equal  frequency.  Oblique  rotation,  on 
the  other  hand,  pivots  the  vectors  so  as  to  most  closely  fit 
the  data  clusters  without  necessarily  retaining  the  orthogon¬ 
ality  constraint.  In  the  simplified  case  just  presented,  the 

vectors  would  be  pivoted  (within  a  scaling  factor)  to  [^l  and 
2 

[£] .  The  vectors  are  no  longer  orthogonal,  nor  is  it  possi¬ 
ble  to  determine  quantitatively  the  amount  of  total  variation 
explained  by  either  of  the  vectors  without  exhaustive  analysis. 
Richman  (1981)  uses  pre-determined  input  fields  to  simulate 
the  principal  component  processes.  He  then  compares  non- 
rotated  components  to  both  orthogonally  and  obliquely  rotated 
components.  His  results  show  obliquely  rotated  components 
give  vastly  improved  delineation  of  the  input  patterns.  He 
then  concludes  that  obliquely  rotated  componei.  s  are  a  better 
tool  to  use  for  map  typing  than  either  orthogonally  rotated 
or  nor-rotated  components.  If  the  purpose  is  to  identify  and 
interpret  all  types  of  meteorological  patterns  that  force 
another  event,  obliquely  rotated  components  would  appear  to 
give  superior  results. 

Rotation  was  not  used  in  this  study  for  several  reasons. 
Delineation  of  patterns  of  meteorological  features  was  not  the 
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specific  purpose  of  this  research.  EOF's  were  used  in  this 
study  for  two  purposes.  First,  they  were  used  to  obtain  the 
orthogonal  coefficients  which  are  used  in  the  formulation  of 
regression  equations  to  forecast  tropical  storm  movement. 

Secondly,  they  were  used  to  reduce  the  data.  The  first  pur¬ 
pose  of  the  research  makes  physical  identification  and  inter¬ 
pretation  of  the  resultant  eigenvalues  less  critical.  It  is 
the  orthogonal  coefficients  derived  from  the  linear  combination 
of  the  eigenvectors  that  are  used,  not  the  actual  eigenvectors 
themselves.  Nevertheless,  it  is  desirable  to  use  the  resultant 
eigenvectors  with  certainty  to  identify  and  interpret  the  forcing 
features.  It  is  primarily  due  to  the  data  reduction  purpose 
of  this  study  that  use  of  rotated  components  becomes  less 
attractive.  Since  the  amount  of  explained  variance  (by  each 
component)  is  unknown  after  rotation,  the  question  of  how  many 
eigenvectors  to  retain  becomes  unclear.  In  fact,  perhaps  the 
only  valid  criteria  for  retention  becomes  Richman's  meaningful¬ 
ness  criteria.  In  any  case,  the  problem  of  determining  how 
many  vectors  to  retain  becomes  much  more  difficult  after  rota¬ 
tion  has  been  employed. 

An  even  more  insidious  problem  with  rotation  of  the  vectors 
is  the  effect  of  underfactoring  on  the  resultant  vectors. 

Richman  (1981)  also  experiments  with  underfactoring.  If  too 
few  vectors  are  retained  and  rotated,  then  the  resultant 
rotated  vectors  become  combinations  of  vectors  associated  with 
several  actual  input  data  clusters.  Therefore,  if  under factoring 
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exists,  the  same  type  of  bisection  that  is  seen  in  the  worst 
possible  case  with  unrotated  vectors  may  occur  with  the  rotated 
vectors.  Since  data  reduction  in  this  study  is  paramount, 
rotation  of  components  seems  ill-advised  at  the  present  time. 

As  a  final  note,  Rickman' s  results,  and  the  simplified 
results  shown  at  the  beginning  of  this  section  clearly  show 
non-rotated  components  may  not  represent  the  true  synoptic 
patterns.  Conceptually,  if  the  data  clusters  (input  data)  are 
not  symmetric,  errors  in  the  EOF  representation  are  less  likely. 
This  is  perhaps  most  easily  seen  with  a  simplified  example. 

If,  for  instance,  in  two  dimensions,  there  are  two  data  clus¬ 
ters  occurring  with  equal  frequency,  one  of  the  resultant 
eigenvectors  will  bisect  the  two  clusters.  This  is  the  case 
in  the  simplified  example  above  since  the  two  cluster  points 
were  assumed  to  occur  with  equal  frequency.  If  the  clusters 
do  not  occur  equally,  this  bisection  does  not  occur.  Richman's 
simulated  fields  were  input  in  mirror-image  pairs,  with  equal 
probability  of  occurrence.  In  this  case,  the  resultant  eigen¬ 
vector  bisected  the  given  input  fields.  True  geophysical 
synoptic  fields  are  not  orthogonal  in  nature  (Barry  and  Perry, 
1973  and  others) .  On  the  other  hand,  it  is  anticipated  that 
true  geophysical  fields  do  not  come  in  matched  opposite  pairs 
that  occur  with  similar  frequency.  It  is  for  this  reason  that 
the  first  several  unrotated  vectors  should  indeed  represent 
actual  synoptic  variability  patterns. 
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IV.  RESULTANT  EMPIRICAL  ORTHOGONAL  FUNCTIONS 

The  mathematical  and  theoretical  framework  for  EOF  analy¬ 
sis  was  developed  in  Chapter  III.  In  this  chapter ,  the  forcing 
of  each  eigenvector  on  tropical  storm  movement  is  examined  by 
correlation  of  storm  motion  with  the  strength  of  the  particular 
vector  for  a  given  data  case,  which  is  given  by  the  value  of 
the  orthogonal  coefficient  associated  with  the  vector.  Before 
any  meaningful  analysis  of  physical  forcing  on  typhoon  motion 
may  be  attempted,  the  actual  eigenvectors  must  be  examined. 

Following  the  mathematical  development  of  Chapter  III,  the 
120  X  504  data  matrix  was  normalized  at  each  grid  point,  and 
the  eigenvectors  were  obtained  for  all  three  data  levels  (500, 
700  and  850mb) .  The  resultant  eigenvalues  for  all  three  levels 
were  then  compared  to  the  random  eigenvalues  generated  from 
Monte  Carlo  simulation  using  100  simulations,  as  suggested  by 
Preisendorfer  and  Barnett  (1977) .  These  Monte  Carlo  eigen¬ 
values  were  all  computed  from  120  X  504  matrices  whose  elements 
were  random  normal  variables  with  a  mean  value  of  zero  and  a 
standard  deviation  of  one.  Thus  the  statistical  structure  of 
the  random  fields  is  identical  to  the  real  data  normalized 
fields.  The  value  of  the  eigenvalues  for  the  three  levels  is 
given  in  Table  4-1,  which  also  gives  the  cumulative  percent 
explained  total  variance  for  each  successive  eigenvector.  Table 
4-2  is  a  list  of  the  randomly  generated  eigenvalues  and  their 
standard  deviations  for  comparable  modes.  If  the  real  data 
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TABLE  4-2 


Eigenvalues  and  standard  deviations  corresponding  to  the 
aodes  in  Table  4-1  as  generated  by  the  Bonte  Carlo  sethod 
(see  description  in  text) . 


BODE 


EIGENVALUE 
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2. 169 
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2.005 
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1.894 
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1.604 
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.017 
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EIGENVALUE  PLUS 
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2.258 
2. 174 
2.110 
2.065 
2.018 
1.981 
1.944 
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1.854 
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1.790 

l:W 

1.713 

1.694 

1.664 

1.639 

1.614 

1.595 


1.231 
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eigenvalue  for  a  specific  mode  is  greater  than  the  random 
eigenvalue  plus  twice  the  standard  deviation,  the  eigenvalue 
and  corresponding  eigenvector  represent  geophysical  signal, 
and  the  eigenvector  is  retained.  To  facilitate  this  compari¬ 
son,  the  value  of  the  random  eigenvalue  plus  twice  the  standard 
deviation  is  also  given  in  Table  4-2.  The  values  of  the  stan¬ 
dard  deviations  in  Table  4-2  are  consistent  with  Preisendorfer 
and  Barnett's  (1977)  results.  Comparisons  of  the  three  actual 
field  eigenvalues  to  those  of  the  random  field  are  conducted 
separately,  since  the  number  of  significant  eigenvectors  may 
be  different  for  each  level.  The  only  relationship  between 
the  eigenvectors  of  the  three  levels  comes  from  any  dynamic 
vertical  coupling  that  may  exist. 

Several  interesting  features  emerge  from  examination  of 
the  eigenvalues.  The  number  of  eigenvectors  to  retain  is  dif¬ 
ferent  depending  on  the  retention  scheme  chosen.  For  example, 
Guttman's  lower  bound  test  suggests  retention  of  the  first  14 
or  15  eigenvalues  for  these  levels.  Cattell's  99%  retention 
rule  would  indicate  retention  of  more  than  40  modes  at  each 
level.  The  Preisendorfer  and  Barnett  selection  scheme  is  much 
less  conservative,  and  suggests  retention  of  only  10  eigenvec¬ 
tors  at  850  and  500mb  and  11  at  700mb.  Because  the  Preisendorfer 
and  Barnett  method  keeps  fewer  modes,  the  potential  for  under¬ 
factoring  increases.  Since  only  10  or  11  eigenvectors  are  to 
be  retained,  roughly  15%  of  the  variance  in  the  fields  is 
directly  accountable  to  random  fluctuations  (noise) .  This 
amount  of  unexplained  variance  is  not  unrealistic  in  the 


tropics.  These  errors  are  most  likely  due  to  either  intiali- 
zation  or  measurement  error  in  the  fields.  This  is  not  sur¬ 
prising  because  the  initialization  problem  in  the  tropics  is 
difficult  (weak  governing  mass-wind  balance  relationship) . 

Even  more  importantly,  there  is  a  very  small  gradient  in  the 
geopotential  field,  except  in  the  region  near  the  tropical 
storm.  This  would  tend  to  give  a  greater  weighting  to  any 
observational  error  in  the  tropics,  compared  to  the  mid-latitudes, 
where  a  linear  balance  initialization  with  quasi-geostrophic 
constraints  can  be  imposed  to  reduce  errors  in  the  height 
fields.  Since  the  areal  extent  of  the  grid  incorporates  a 
large  portion  of  the  tropical  synoptic  forcing  field  (Fig.  2-1) 
it  is  entirely  conceivable  that  there  is  a  15%  level  of  random 
error  in  the  D- value  fields. 

The  500mb  eigenvalues  from  Table  4-1  are  graphically  com¬ 
pared  to  the  Monte  Carlo  simulated  eigenvalues  (Table  4-2)  in 
Fig.  4-1.  It  is  seen  the  actual  500mb  eigenvalues  decrease 
very  rapidly  with  increasing  node,  which  indicates  that  a  large 
number  of  the  components  represent  data  clusters  containing 
random  noise.  Graphs  of  the  700  and  850mb  eigenvalues  are  not 
included  because  they  are  very  similar  to  the  500mb  values. 

Preisendorfer  and  Barnett's  assertion  that  asymptoticity 
does  not  apply  for  a  sample  size  of  504  data  cases  may  also 
be  examined.  If  the  asymptotic  results  are  valid,  the  ratio 
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should  be  very  nearly  constant.  Here  is  the  mean  randomly 
generated  ith  eigenvalue,  is  the  standard  deviation  for  the 
ith  mode,  n  is  the  number  of  cases  and  m  is  the  number  of 
grid  points.  The  value  of  this  ratio  is  given  in  Table  4-3 
for  selected  modes.  It  is  seen  that  the  ratio  is  not  con¬ 
stant,  nor  does  it  approach  the  theoretical  value  expected 
for  asymptotic ity.  Thus  it  is  concluded  that  asymptotic 
theory  is  not  valid  for  this  study. 


TABLE  4-3 

Test  parameter  for  the  asymptotic  theory  of  eigenvalues 
is  shown  for  various  modes  (see  text  for  details) . 


MODE 

1  2  5  10 

15 

20 

40 

60  120 

RATIO 

49.3  56.8  78.6  78.6 

88.2 

80.9 

85.9 

84/2  27.3 

Based  on  these  tests  for  significant  eigenvectors,  it  was 
decided  to  retain  the  largest  10  eigenvectors  for  all  levels. 
These  first  10  eigenvectors  at  500mb  are  shown  in  Figs.  4-2 
through  4-11  and  will  be  examined  in  detail.  The  first  10 
eigenvectors  for  both  the  700  and  850mb  level  are  shown  in 
Appendix  A,  without  comment.  The  discussion  of  the  first  10 
eigenvectors  at  500mb  will  include  an  interpretation  of  the 
probable  forcing  that  the  particular  pattern  has  on  the  tropi¬ 
cal  storm,  which  is  always  at  grid  point  70. 

The  actual  values  of  the  eigenvectors  in  Figs.  4-2  through 
4-11  are  non-dimensional,  since  normalized  data  are  used  on 
input.  The  broad  scale  forcing  features  of  an  eigenvector  do 


61 


have  meaning  in  the  standard  meteorological  sense.  Areas  of 
higher  values  of  the  eigenvector  may  properly  be  thought  of 
as  high  pressure  (D-value)  regions,  areas  of  low  elements  as 
low  pressure  regions,  and  more  strongly  packed  isopleths 
indicate  stronger  flow  regions.  Finally,  it  is  stressed  that 
each  eigenvector  actually  represents  the  pattern  shown  and  the 
exact  inverse  of  the  pattern  shown.  Relative  gradients  of  the 
patterns  and  positions  of  the  closed  isopleth  features  remain 
unchanged  for  the  positive  or  inverse  eigenvectors.  All  follow¬ 
ing  discussion  will  be  made  using  the  eigenvector  pattern 
shown;  the  inverse  case  will  not  be  discussed.  Relevant  features 
for  the  inverse  pattern  may  easily  be  obtained  following 
the  same  reasoning  as  below. 

Eigenvector  1  (Fig.  4-2):  This  pattern  shows  a  band  of 
stronger  easterlies  directly  to  the  north  of  the  cyclone. 
Additionally,  there  is  a  slight  northerly  component  to  the  flow 
directly  upstream  of  the  storm.  The  forcing  of  the  tropical 
cyclone  for  this  type  of  pattern  should  be  to  the  west  and 
south . 

Eigenvector  2  (Fig.  4-3) :  This  component  shows  small  gradi¬ 
ents  throughout  the  field,  as  expected  in  the  tropics.  As  with 
pattern  1,  a  broad  band  of  easterlies  is  seen  to  the  north  of 
the  storm,  but  they  are  much  farther  north  than  for  pattern  1. 

A  primary  difference  between  this  component  and  the  first  vec¬ 
tor  is  that  there  appears  to  be  a  low  centered  south- southwest 
of  the  storm,  while  thi3  low  was  to  the  south-southeast  for 


vector  1.  This  component  and  component  1  both  exhibit  proper¬ 
ties  of  planetary  scale  waves,  as  they  both  have  very  low 
wavenumber  over  the  70  degree  longitudinal  span  of  the  chart. 

This  pattern  should  induce  weak  forcing  to  the  west  and  to 
the  south. 

Eigenvector  3  (Fig.  4-4) :  An  entirely  different  type  of 
pattern  compared  to  the  first  two  components  is  seen  here.  The 
vector  has  a  fairly  strong  area  of  lower  values  to  the  west, 
with  a  small  higher  valued  area  south- southeast  of  the  storm. 
Another  small  low  is  seen  well  to  the  northeast  corner  of  the 
pattern.  Forcing  on  the  storm  should  be  to  the  north  (strongly) 
and  east  (weakly) . 

Eigenvector  4  (Fig.  4-5) :  The  predominant  feature  of  this 
vector  is  a  well  developed  low  to  the  north  and  east  of  the 
storm.  The  storm  itself  appears  to  be  situated  in  a  strong 
flow  region  between  a  high  and  low.  The  forced  motion  she. -Id 
be  strongly  to  the  east,  with  a  weak  drift  to  the  south. 

Eigenvector  5  (Fig.  4-6)  :  A  strong  high  valued  area  directly 
to  the  north  of  the  storm  is  the  predominant  feature  in  this 
eigenvector.  The  pattern  is  essentially  weavenumber  1  across 
the  70  degree  span  of  the  chart.  The  physical  analogue  of 
this  vector  is  difficult  to  determine.  It  could  well  be  that 
this  is  a  bisection  of  two  distinct  data  clusters  of  high  pres¬ 
sure  on  the  outer  extremities  of  the  grid,  since  this  pattern 
bears  strong  resemblence  to  the  non-rotated  bisection  case 
simulated  by  Richman  (1981).  In  any  case,  the  eigenvector  is 
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usable  with  coefficients  that  appear  in  the  formulation  of 
regression  equations,  and  does  indeed  describe  a  global  wave¬ 
number  5  pattern.  This  pattern  should  force  tropical  storms 
to  the  west  and  north. 

Eigenvector  6  (Fig.  4-7):  This  pattern  is  another  wave- 
number  1  across  the  70  degree  longitude  span  of  the  grid 
(global  wavenumber  5) .  The  dual  low  centers  are  generally 
similar  to  the  pattern  in  eigenvector  3.  The  forced  motion 
of  the  tropical  cyclone  should  be  to  the  west,  with  little 
meridional  forcing. 

Eigenvector  7  (Fig.  4-8):  The  expected  higher  degree  of 
complexity  for  higher  order  modes  is  beginning  to  show  in 
this  vector.  Five  well-defined  high  or  low  centers  are  seen 
in  the  pattern.  This  vector  is  approximately  globax  wavenumber 
7,  so  that  with  this  eigenvector  the  slow  transition  from 
large  scale  to  smaller  synoptic  scales  is  beginning.  The 
physical  meaning  of  the  pattern  is  also  becoming  more  diffi¬ 
cult  to  define.  The  forcing  of  the  storm  shoulr.  be  weakly  to 
the  north  and  west. 

Eigenvector  8  (Fig.  4-8):  As  with  eigenvector  7,  there  is 
a  complex  pattern  of  well-defined  high  and  low  value  centers, 
with  the  storm  located  in  the  northern  regions  of  a  high 
center.  Forcing  to  the  east  and  south  is  anticipated  from  this 
pattern,  althou„  .  all  forced  motions  should  be  weak. 

Eigenvector  9  (Fig.  4-10):  Eigenvector  9  is  somewhat  sur¬ 
prising  since  it  has  less  complexity  than  the  proceeding  two 
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eigenvectors.  Nevertheless,  it  is  approximately  global  wave- 
number  7.  A  strong  blocking  high  center  is  found  directly  to 
the  west  of  the  storm,  while  the  storm  itself  is  on  the  west 
side  of  a  weaker  low.  It  is  possible  that  the  blocking  high 
pattern  represents  the  effect  of  the  500mb  anticyclone  east 
of  the  Tibetan  Plateau  heat  low.  Motions  forced  from  this 
pattern  should  be  weakly  to  the  south  and  east. 

Eigenvector  10  (Fig.  4-11) :  The  final  eigenvector  retained 
in  the  truncated  set  of  10  is  the  most  complex.  A  series  of 
well  developed  highs  and  lows  are  seen  throughout  the  extent 
of  the  grid.  Short  range  forcing  on  the  storm  would  come  from 
a  high  located  south  of  the  cyclone  and  two  strong  low  centers 
flanking  the  storm.  The  pattern  is  wavenumber  2  over  the  70 
degrees  covered  by  the  grid  and  corresponds  to  a  global  wave- 
number  10.  This  pattern  defines  even  smaller  synoptic  scale 
forcing  than  the  previous  patterns.  Perhaps  coincidentally, 
the  eigenvector  10  for  the  700mb  data  set  (Appendix  A)  is 
virtually  identical.  This  similarity  indicates  this  pattern 
is  probably  a  true  physical  signal,  which  is  vertically  coupled 
through  the  mid-tropo sphere.  Motion  forced  from  this  pattern 
will  be  to  the  south  with  little  zonal  forcing. 

It  is  essential  to  show  how  these  ten  eigenvectors  just 
described  would  combine  to  represent  the  original  field.  Selec¬ 
tion  of  a  case  on  0000GMT  27  August  1967  was  made  at  random 
to  demonstrate  the  reconstruction.  At  this  time,  Typhoon 
Marge  was  located  at  approximately  18 °N  125 °E  with  maximum 
winds  of  125  knots.  The  actual  500mb  D-value  field  is  shown  in 
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Fig.  4-12.  The  areal  extent  of  the  grid  is  from  43®  to  8°N, 
and  85°  to  155®E.  Therefore,  this  grid  encompasses  both 
tropical  and  mid-latitude  forcing  on  the  storm.  A  linear 
combination  of  the  first  ten  eigenvectors  and  the  associated 
orthogonal  coefficients  should  be  adequate  to  represent  the 
relevant  physical  features  according  to  the  discussion  in 
Chapter  III.B. 

Among  the  salient  features  seen  in  the  total  field  (Fig. 
4-12)  is  a  strong  blocking  high  pressure  to  the  northwest  of 
the  typhoon,  positioned  at  about  25°N,  100°E.  A  SOOmb  high 
pressure  at  this  location  is  east  of  the  Tibetan  Plateau  heat 
low  which  is  a  stationary  feature  of  the  planetary  circulation. 
There  is  also  a  strong  high  pressure  cell  (D-values  in  excess 
of  +320  meters)  to  the  northeast  of  the  typhoon.  This  second 
high  pressure  is  the  westward  extension  of  the  subtropical 
anticyclone  over  the  western  Pacific.  Well  to  the  north  of 
the  cyclone  is  a  strong  band  of  mid-latitude  westerlies.  A 
well -developed  trough  extends  from  the  westerlies  into  the 
tropics  and  encircles  the  typhoon. 

As  the  input  data  have  been  normalized,  the  fields  need 
to  be  reconstructed  using 

di  *  „L(Ci  S-in)ai  +  h-  1  "  l'J . 1J0' 

n=*i 

where  m  is  the  number  of  eigenvectors  and  orthogonal  coeffi¬ 
cients  used  in  the  reconstruction,  3^  and  s^  are  the  mean 
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j.  4-2.  Eigenvector  1  elements  (multiplied  by  100) 
at  SOOrnb  with  the  tropical  cyclone  located 
at  the  x-position. 


.g.  4-3.  Similar  to  Fig.  4-2  except  for  eigenvector  2 
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Fig.  4-8,  Similar  to  Fig.  4-2  except  for 
eigenvector  ?. 


Fig.  4-9.  Similar  to  Fig.  4-2  except  for 
eigenvector  8. 
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and  standard  deviation  of  the  D-value  at  the  ith  grid  point, 
and  d^  is  the  reconstructed  value.  -v 

The  reconstructed  field  using  only  the  first  vector  and 
coefficient  (Fig.  4-13)  shows  westerlies  well  to  the  north 
with  a  ridge  circling  over  the  top  of  the  storm  from  the  east. 
The  general  features  revealed  by  use  of  this  eigenvector  are 
the  westerlies  and  high  to  the  northwest.  When  the  second 
and  third  vectors  are  included  in  the  reconstruction  (Fig. 

4-14),  little  information  is  gained.  This  is  expected  since 
these  two  patterns  are  not  evident  in  the  actual  field. 

The  inverse  of  the  fourth  eigenvector  has  similarities  to 
the  actual  case  being  reconstructed.  Both  patterns  show  a 
high  pressure  to  the  northeast  and  northwest  of  the  storm 
with  a  trough  in  the  northern  section  of  the  grid.  It  is 
anticipated  that  addition  of  this  eigenvector  should  greatly 
improve  resolution  of  features  on  the  reconstructed  field. 
Changes  in  the  field  are  evident  on  Fig.  4-15,  but  the  overall 
resolution  of  the  features  is  not  dramatically  improved. 
Nevertheless,  inclusion  of  this  vector  does  increase  the  high 
pressure  cell  to  the  northeast  of  the  typhoon,  and  increases 
the  gradient  between  the  mid-latitude  and  tropical  regions. 

The  inverse  of  the  fifth  eigenvector  also  has  many  similari¬ 
ties  to  the  original  field.  A  significant  improvement  in  the 
shape  of  the  general  features  is  seen  after  the  fifth  vector 
is  added  (Fig.  4-16) .  A  slight  trough  appears  in  the  mid¬ 
latitude  westerlies  and  a  coupling  of  the  tropical  and  mid¬ 
latitude  trough  is  seen  for  the  first  time.  Inclusion  of  the 
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next  three  eigenvectors  (vectors  6  through  8}  add  very  little 
to  the  reconstructed  field,  and  are  not  shown.  Similarities 
between  eigenvector  9  and  the  original  field  include  a  sharp 
trough  in  the  westerlies  which  connects  with  a  tropical  trough 
in  the  vicinity  of  the  typhoon.  When  this  eigenvector  is  added 
to  the  linear  combination  of  the  previous  eight,  the  broad 
scale  pattern  (Fig.  4-17)  is  delineated  much  better.  There  is 
general  agreement  in  the  positions  of  the  large-scale  features 
and  the  gradients  between  them.  Further  refinement  through  use 
of  higher  order  modes  is  necessary  to  obtain  the  actual  chart. 
The  difference  between  the  patterns  in  Fig.  4-12  and  4-18  is, 
according  to  the  analysis  here,  simply  random  noise.  Never¬ 
theless,  with  only  the  first  nine  eigenvectors  the  salient 
features  have  emerged,  and  major  forcing  from  the  large  scale 
on  the  typhoon  is  defined.  The  continued  progression  in  the 
reconstructed  fields  using  10,  20  and  40  eigenvectors  are  shown 
in  Figs.  4-18  to  4-20.  It  is  noted  that  the  reconstructed 
field  is  almost  exact  after  40  terms  are  included,  and  some 
features  due  to  random  noise  in  the  field  are  reproduced.  The 
correlation  of  the  reconstructed  field  using  various  modes  to 
the  original  field  is  shown  in  Table  4-4.  It  is  seen  here  that 
the  correlation  of  the  two  fields  asymptotically  approaches  1 
as  the  number  of  modes  in  the  reconstruction  is  increased. 
Furthermore,  large  jumps  in  the  correlation  are  seen  when  the 
first  and  ninth  eigenvectors  are  added,  and  smaller  jumps  are 
seen  with  inclusion  of  the  third  and  fourth  vectors.  This  is 
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Reconstruction  of'  500xnb  D-value  field,  0000GMT 
Fig.  4-13.  Reconstruct^  the  fir3t  aiganveet0r  and 

orthogonal  coefficient.  This  compares  to 
,•  true  field  (Fig*  4-12)  .  — 
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_Fig.  4-14. 


Similar  to  Fig.  4-13,  except  first  three 
eigenvectors  are  used  in  reconstruction. 
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Fig-.  4-16.  Similar  to  Fig.  4-13,  except  first  five 
eigenvectors  are  used  in  reconstruction. 


Fig.  4-17.  Similar  to  Fig.  4-13,  except  first  nine 
eigenvectors  are  used  in  reconstruction. 
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Pig.  4-19.  Similar  to  Fig.  4-13,  except  first  twenty 
eigenvectors  are  used  in  reconstruction. 
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in  agreement  with  the  reconstruction  shown  above  with  the 
exception  that  the  fourth  instead  of  the  fifth  eigenvector 
seems  to  have  a  larger  impact  on  the  reconstruction. 

Because  inclusion  of  the  eigenvectors  1,  3,  4,  5  and  9 
seemed  to  have  the  greatest  impact  in  the  reconstruction,  the 
orthogonal  coefficients  associated  with  these  eigenvectors 
should  have  larger  magnitudes  than  the  other  coefficients  for 
this  case.  The  values  of  the  first  ten  coefficients  are  shown 
in  Table  4-5.  The  coefficients  associated  with  eigenvectors 
1  and  9  are  larger  than  the  other  coefficients.  Although  the 
value  of  coefficient  5  is  the  third  largest  value,  it  is  the 
same  magnitude  as  the  coefficients  associated  with  the  second 
and  third  eigenvectors.  This  is  explained  in  that  eigenvec¬ 
tor  2  tends  to  re-enforce  the  pattern  of  the  first  vector, 
while  the  third  eigenvector  enforces  the  joint  pattern  of  one 
and  two.  The  coefficient  associated  with  the  fourth  eigenvec¬ 
tor  is  small  for  this  case,  indicating  that  this  pattern  really 
had  little  effect  on  the  reconstruction. 


TABLE  4-4 

Correlation  coefficient  of  the  reconstructed  field,  using 
the  number  of  modes  indicated,  with  the  actual  field  being 
reconstructed  (see  text) . 


NUMBER  OF 
MODES  USED 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

CORRELATION 

.618 

.583 

.663 

.737 

.752 

.757 

.728 

.734 

.885 

.867 

NUMBER  OF 
MODES  USED 

15 

20 

25 

30 

40 

50 

60 

120 

CORRELATION 

.852 

.894 

.936 

.974 

.994 

.993 

.994 

1.000 

TABLE  4-5 

Values  for  the  first  10  orthogonal  coefficients  for  the 
case  of  27  August  1967.  (See  text  for  details). 

Coefficient  12  34  5  6789  10 

Value  5.94  1.50  -1.70  -.82  -1.85  -1.03  -.75  .26  2.56  -.38 


These  ten  orthogonal  coefficients  define  the  pattern,  and 
will  be  used  shortly  as  predictors  in  regression  equations  for 
forecasting  tropical  cyclone  motion.  The  hypothesis  is  that 
the  forcing  of  typhoon  motion  may  be  determined  from  the  vari¬ 
ous  eigenvector  patterns.  As  a  preliminary  test  of  this  hypothe¬ 
sis,  the  zonal  and  meridional  components  of  the  typhoon  motion 
(in  nautical  miles  for  various  times)  are  correlated  with  the 
orthogonal  coefficients  associated  with  the  eigenvectors  (ob¬ 
tained  from  base  time  field) .  The  correlations  are  calculated 
on  12-hour  increments  for  the  12-  to  84-hour  displacement  using 
the  Pearson  product  moment  (Dixon  and  Brown,  1979) .  Because 
the  motion  is  defined  to  be  positive  to  the  north  and  to  the 
west,  a  positive  correlation  means  increased  north  or  west 
forcing,  relative  to  the  mean  displacement  at  a  given  time,  with 
an  increase  in  the  value  (not  magnitude)  of  the  coefficients. 

This  holds  for  both  the  positive  and  negative  (inverse)  coeffi¬ 
cients  in  that  increases  in  value  for  a  negative  coefficient 
(decrease  in  magnitude)  decreases  the  south  or  east  forcing, 
or  equivalently  increases  the  north  or  west  forcing.  Each 
coefficient  contributes  to  the  total  forcing,  and  the  total 
movement  is  a  summation  of  the  forcing  in  all  directions  by  all 
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eigenvectors.  Correlations  are  obtained  for  a  dependent  set 
of  454  cases  (or  fewer  for  longer  time  intervals)  .  Assuming 
the  motion  and  orthogonal  coefficients  are  both  distributed 
normally,  Chatfield  (1980)  shows  the  distribution  of  corre¬ 
lation  coefficients  for  uncorrelated  variables  is  distributed 
N(0,1/N).  This  means  that  any  correlation  of  less  than  about 
.09  is  not  significant  (at  the  95%  level).  Tables  4-6  and 
4-7  give  the  correlations  for  zonal  and  meridional  motion, 
respectively. 

Most  of  the  correlations  agree  nicely  with  the  instan¬ 
taneous  forcing  of  the  eigenvectors  inferred  from  Figs.  4-2 
to  4-11,  although  there  are  surprises.  Perhaps  the  largest 
surprise  is  the  shift  in  meridional  forcing  in  eigenvector  1 
as  the  time  interval  increases.  For  times  less  than  36  hours, 
the  forcing  is  the  anticipated  south  forcing.  The  forcing 
at  48  and  60  hours  is  not  significant,  indicating  the  strength 
of  this  pattern  at  this  time  level  gives  little  information  on 
resultant  48-  and  60-hour  meridional  motion.  Between  72  and 
84  hours,  the  forcing  of  this  eigenvector  actually  becomes 
signficiantly  northward  from  the  mean  72  to  84  hour  meridional 
displacement.  A  possible  explanation  for  this  phenomenon  is 
that  this  pattern  identifies  recurving  storms.  During  the 
short  term,  the  forcing  is  to  the  south,  but  even  more  strongly 
to  the  west.  The  storm  then  crosses  the  mean  meridional  dis¬ 
placement  location  after  48  to  60  hours,  still  well  to  the  west 
of  the  initial  longitude.  This  is  not  to  say  the  storm  actually 
moves  north  of  the  initial  latitude,  only  that  the  storm  moves 
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Table  4-6 


ortho 
and  t 


Pearson  product  aoaent  (correlation)  between  the 
gonal  coefficient  associated  with  the  given  eige 


iven  eigenvector 
.  l  positive 
udea  Is  the 


and  the  zonal  potion  at  12  hour  increaents.  A  posi 
correlation  uplips  west . forcing.  Also  included  is  t 
instantaneous  aotion  anticipated  froa  the  fora  of  the 
eigenvectors  in  Pigs  4-2  to  4-11. 


NODE 

ANTICIPATED 

TIBS  INTERVAL 

FORCING 

12 

24 

36 

48 

60 

72 

84 

1 

iEST 

♦  .506 

♦  .530 

♦  .553 

♦  .477 

♦  .495 

♦  .358 

♦  .  341 

2 

NEST 

-.072 

-.061 

-.059 

-.051 

-.061 

-.092 

-.079 

3 

EAST 

-.109 

-.103 

-.139 

-.074 

-.049 

-.009 

♦  .001 

4 

EAST 

-.439 

-.412 

-.355 

-.373 

-.371 

-.361 

-.340 

5 

NEST 

♦  .301 

♦  .274 

♦  .283 

♦  .252 

-.221 

♦  .284 

♦  .291 

6 

NEST 

♦  .101 

♦  .084 

♦  .  039 

-.043 

-.037 

-.090 

-.084 

7 

NEST 

-  .087 

-.079 

-.093 

-.077 

-.098 

-.058 

-.014 

3 

EAST 

-.293 

-.253 

-.265 

-.208 

-.205 

-.240 

- .  263 

9 

LITTLE 

-.129 

-.095 

-.045 

-.151 

-.  132 

-.125 

-.118 

10 

LITTLE 

-.018 

♦  .019 

♦  .028 

♦  .031 

♦  .027 

♦  .093 

♦  .073 
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TABLE  4-7 


Similar  to  Table  4-6,  except  for  meridional  motion 
and  positive  correlation  implies  northward  forcing. 


MODS  ANTICIPATED  TIMS  INTERVAL 

FORCING 


12 

24 

36 

48 

60 

72 

84 

1 

SOOTH 

-.199 

-.211 

-.242 

♦  .017 

♦  .056 

♦  .194 

♦  .312 

2 

SOOTH 

-  .213 

-.184 

-.  175 

-.175 

-.158 

-.205 

-.164 

3 

NORTH 

♦  .362 

♦  .359 

♦  .339 

♦  .26  2 

♦  .214 

♦  .178 

♦  .061 

4 

SOOTH 

-.183 

-.  176 

-.  141 

-.111 

-.080 

-.040 

-.012 

5 

NORTH 

♦  .075 

♦  .034 

♦  .017 

♦  .009 

-.005 

♦  .037 

-.047 

6 

LITTLE 

-.  158 

-.163 

-.  136 

-.068 

-.112 

-.102 

-.122 

7 

NORTH 

♦  .227 

♦  .224 

♦  .202 

♦  .254 

♦  .223 

♦  .195 

♦  .086 

8 

SOOTH 

♦  .084 

♦  .084 

♦  .  071 

♦  .021 

♦  .040 

-.054 

-.003 

9 

LITTLE 

-  .  047 

-  .05  0 

-.007 

♦  .155 

♦  .  176 

♦  .210 

♦.  194 

10 

SOOTH 

-.141 

-.176 

-.207 

-.262 

-.200 

-.143 

-.193 

north  of  the  expected  latitudinal  position  at  around  48  hours, 
and  then  remains  north  of  the  expected  position.  The  westward 
forcing  throughout  the  entire  period  is  not  inconsistent  with 
recurvature,  due  to  the  large  initial  westward  displacement. 

By  the  72  hour  time,  the  storm  is  north  and  west  of  the  mean 
track  displacement  at  that  time,  due  only  to  coefficient  1 
forcing.  The  storm  displacement  from  the  base  time  location 
is  shown  in  Fig.  4-21  for  all  cases  tt  *•  have  a  500mb  coeffi¬ 
cient  1  less  them  -9,  while  Fig.  4-22  is  a  graph  of  storm 
displacement  for  those  storms  with  a  coefficient  1  greater 
than  +9.  Recurvature  is  not  seen  immediately  here,  and  more 
sophisticated  statistical  analysis  techniques  are  required  to 
verify  the  hypothesis  presented  above.  Nevertheless,  these 
two  graphs  show  very  nicely  how  the  movement  correlates  with 
the  coefficient  value. 

The  other  correlations  shown  in  Tables  4-6  and  4-7  are 
consistent  with  the  inferred  instantaneous  motion  obtained 
from  the  eigenvectors.  Eigenvectors  3  and  7  (along  with  1) 
have  the  largest  correlation  (forcing)  on  the  meridional 
motion.  Eigenvector  1  has  the  greatest  impact  on  the  zonal 
forcing,  with  vectors  4,  5  and  8  also  showing  significant 
forcing.  Surprisingly,  eigenvectors  2  and  4  also  correlated 
significantly  with  the  meridional  motion.  From  the  results 
shown  here,  the  anticipated  forcing  is  in  good  agreement 
with  the  actual  motion,  and  justifies  use  of  the  coeffi¬ 
cients  as  predictors  in  regression  equations  for  the  storm 
motion. 
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Fig.  4-21.  Storm  displacement  from  base  time  position, 
in  nautical  miles  for  all  storms  with  500mb 
coefficient  1  less  than  -9.  12-hour  movement 
*  is  indicated  by  a  cross. 


Fig.  4-22.  Similar  to  Fig.  4-21  except  these  storms  all 
have  500mb  coefficient  1  greater  than  +9. 
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V.  REGRESSION  ANALYSIS 


In  the  preceding  chapter ,  it  was  demonstrated  that  the 
orthogonal  coefficients  associated  with  eigenvectors  give 
qualitative  insight  to  physical  forcing  mechanisms  acting  on 
tropical  storms.  Therefore,  it  is  hypothesized  that  it  is 
possible  to  use  these  coefficients  to  forecast  quantitatively 
tropical  storm  motion.  A  regression  approach  is  appropriate 
to  investigate  this  hypothesis.  Very  briefly,  regression 
analysis  involves  using  a  linear  combination  of  known  quanti¬ 
ties  (predictors)  to  estimate  the  value  of  an  unknown  quan¬ 
tity  (predictand) .  Dixon  and  Brown  (1979)  give  a  concise 
summary  of  regression  analysis,  while  Neter  and  Wasserman 
(1974)  provide  theoretical  background  of  the  technique.  In 
the  initial  portion  of  this  chapter,  the  model  is  developed, 
with  model  results  appearing  at  the  end  of  the  chapter. 

It  was  decided  that  of  the  504  total  data  cases  available, 

50  would  be  used  as  independent  cases  to  test  the  resultant 
equations.  Use  of  50  cases  for  the  independent  data  set  file 
is  arbitrary,  but  still  gives  a  large  dependent  data  set.  In 
the  initial  set  of  504  cases,  185  cases  had  both  complete 
past  histories  (warning  positions  36  hours  prior  to  the  base 
time)  and  best  track  positions  that  extended  to  84  hours  be¬ 
yond  the  base  time.  Of  these  185  cases,  it  was  decided  to 
hold  35  cases  to  comprise  part  of  the  independent  set,  leaving 
150  cases  with  full  history  in  the  dependent  set.  The  remaining 
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15  independent  cases  were  selected  from  the  remaining  cases 
without  complete  history.  All  cases  in  independent  data  set 
were  selected  randomly  within  their  respective  history  sub¬ 
sets.  This  process  left  454  potential  cases  over  which  the 
regression  equations  were  formed.  The  fifty  independent  cases 
are  shown  in  Table  5-1.  It  will  be  shown  shortly  that  the 
actual  number  of  cases  used  to  derive  the  regression  equa¬ 
tions  is  less  than  454 ,  due  to  the  specifications  of  the 
predictors. 

Predictands  for  this  study  are  the  12-  to  84-h  zonal  and 
meridional  displacements  of  the  storms  in  12-hour  increments. 
These  distances  are  determined  from  the  base  time  JTWC  warn¬ 
ing  position  to  the  JTWC  best-track  position  at  the  predic- 
tand  time.  Positive  motion  is  defined  to  the  north  and  to  the 
west,  since  the  majority  of  the  displacements  are  to  the  north 
and  west.  As  there  are  14  predictands,  14  regression  equa¬ 
tions  are  required  for  each  of  the  three  pressure  levels  for 
which  synoptic  data  are  available.  Because  the  basic  data 
are  only  available  at  12-hour  intervals,  and  the  analyzed  maps 
are  delayed  several  hours,  the  forecast  time  must  be  carefully 
distinguished  from  the  guidance  time.  A  12-h  forecast  based 
on  0000GMT  data  is  the  forecast  position  valid  at  1200GMT, 
whereas  a  12-h  guidance  based  on  the  0000GMT  data  would  be 
issued  several  hours  after  0000GMT  and  would  be  valid  12  hours 
after  issuance.  It  is  estimated  that  four  hours  would  be 
needed  to  prepare  and  issue  the  forecast.  Hence,  a  forecast 
issued  based  on  0000GMT  data  could  only  be  used  in  preparing 


TABLE  5  -  1 


The  independent  stores:  their  dates  of  occurrence, 
position  and  intensity,  ana  their  past  warning  and  future 
best  track  history. 


NAME  YEAR  MONTH/DATE  TIME  LAT 


1  THEBESE 

2  VIOLET 

3  GEORGIA 

4  GEORGIA 

5  OPAL 

6  BOTH 

7  DINAH 

8  GILDA 

9  JEAN 

10  KIM 

11  » INDY 

12  AGNES 

13  AGNES 

14  DELLA 

15  CARMEN 

16  JODY 

17  JODY 

18  HELEN 

19  IDA 

20  GRACE 

21  GRACE 

22  BILLIE 

23  JOAN 
PATSY 
MARGE 

VERA 
WANDA 
BABE 
LOCY 
IRIX 
TRIX 

32  VIRGINIA 

33  BENDY 
EMMA 

POLLY 
AGNES 
ELAINE 
GLORIA 
IRMA 
LOLA 
RITA 
GRACE 
RITA 
RITA 
ISSS 
ALICE 
CLGA 
SALLY 


24 

25 

26 

27 

28 

29 

30 

31 


34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 


49  THSRBSB 

50  BILLIE 


1967 

1967 

1967 

1967 

1967 

1967 

1967 

1967 

1968 
1963 
1968 
1968 
1968 
1968 
1968 

1968 
1958 

1969 
1969 
1969 

1969 

1970 
1970 
1970 

1970 

1971 
1971 
1971 
1971 
1971 
1971 
1971 

1971 
1974 
1974 
1974 
1974 
1974 

1974 

1975 
1975 

1975 

1972 
1972 
1972 

1972 

1976 
1976 
1976 

1973 


3 

4 

7 

8 
9 
9 


/  18 

',1 

10 
9 


/ 

/ 

/ 

10  ^  19 

11  /  11 

‘2  ‘ 
/ 

i, 

/ 


4 

6 

8 

8 

9 


10 
10 
10 
10  . 
10  / 
10  / 


9 

3  1 
31 
3 

U 
26 
29 
1  1 
18 
1 
2 


3/27 


14 
20 

3 

15 
29 

5 


/  19 
/  22 


10 
1 1 
1 1 
4 

4 

5 

7 

8  .  __ 

8/25 
*  /  4 

/  9 

/  15 
/  28 
/  26 

10  /  27 

11  /  24 
8  /  -~ 

V 

i  '■ 

6 
7 
7 


9 

9 

6 

8 

9 


JO 

29 

15 

» 

6 


/  16 
/  28 
/  15 
/  15 


000GMT 

1200GMT 

OOOGMT 

000GMT 

OOOGMT 

Q00GIS7 

OOOGMT 

OOOGMT 

1200GMT 

OOOGMT 

1200GMT 

1200GMT 

OOOGMT 

OOOGMT 

OOOGMT 

1200GMT 

1200GMT 

OOOGMT 

OOOGMT 

OOOGMT 

1200GMT 

1200GMT 

1200GMT 

1200GNT 

OOOGMT 

1200GMT 

1200GMT 

OOOGMT 

1200GMT 

1200GMT 

1200GMT 

1200GMT 

1200GMT 

OOOGMT 

OOOGMT 

1200GMT 

OOOGMT 

OOOGMT 

1200GMT 

1200GMT 

OOOGMT 

1200GMT 

OOOGMT 

1200GMT 

OOOGMT 

OOOGMT 

1200GMT 

OOOGMT 

1200GMT 

OOOGMT 


10. 7N 
10. 9N 
22.  ON 
35. 9N 
26. 6N 
27.  IN 
10. 6N 
10. 6N 
10. 6N 
17. 5N 
20. 5N 
16. IN 
23. 4N 
19. 9N 
18. 2N 
11. ON 
16. 6N 
23. 7N 
18. 8N 
26. 9N 
24. 7N 
27.  ON 
14. 4N 
15. 7N 
14. 7N 
18. 2N 
11. 7N 
19. 2N 
18. 7N 
25. 7N 
25. 2N 
22. 2N 
24. IN 
15. 7N 
19. 8N 
24. 9N 
16. 9N 
15. 6N 
14. 6N 
12. JN 
26. oN 
17. 9N 
21.  IN 
21  .SN 
21. IS 
30. 2N 
12. 3N 
19. 4N 
22. 4N 
20. 9N 


LON 

139. 9E 
138. 3E 
136. 7E 
150.  IB 
140. 4E 
162. 8B 
138.7B 
152. 9E 
150. 6E 
132. 5E 
141. 9E 
155.  9E 
137. 2E 

«»:ll 

147. 8E 
135. 6E 
141. 7E 
145. 6E 
166. 6E 
162. 8E 
131. 3B 
117. $E 
114. 7E 
116. 9E 
125. 6E 
112. IE 
119. 3E 
124. 7E 
151. OB 
142. 9E 
136. 9E 
158. 3E 
127. OE 
143. 5E 
151. 9E 
127.  IE 
131. 2E 
134. 3E 
117. OE 
128. 8E 
128. 8E 
135. 6E 
134. 8E 
151. 7E 
144. 2E 
129. 8E 
132. OE 
136. 9E 
125. 3E 


MAX 


HOURS 


BIND 

POSITION 

40 

36 

84 

65 

36 

84 

35 

36 

84 

50 

36 

24 

85 

36 

84 

55 

36 

84 

60 

36 

84 

75 

36 

84 

85 

36 

84 

85 

36 

84 

130 

36 

84 

75 

36 

84 

60 

36 

48 

To 

36 

36 

13 

100 

36 

84 

105 

36 

72 

95 

36 

36 

90 

36 

84 

2° 

12 

48 

70 

36 

84 

1  10 

36 

34 

85 

36 

84 

60 

36 

24 

55 

24 

60 

85 

36 

48 

40 

36 

84 

45 

36 

48 

125 

36 

60 

75 

36 

84 

85 

36 

84 

60 

36 

60 

105 

36 

60 

40 

36 

48 

75 

36 

84 

50 

36 

84 

85 

36 

84 

85 

36 

84 

70 

36 

84 

45 

36 

48 

45 

36 

84 

30 

36 

84 

80 

36 

84 

65 

36 

84 

1  10 

36 

84 

75 

36 

36 

45 

36 

84 

100 

36 

84 

120 

36 

84 

115 

36 

84 
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the  0400GMT  guidance.  A  12-h  guidance  will  then  be  valid  at 
1600GMT.  To  insure  that  an  estimate  of  the  position  during 
the  next  72  hours  is  always  available,  forecasts  are  made  to 
84-h  after  the  base  time.  All  subsequent  references  to 
times  will  be  for  forecast  rather  than  guidance  timing. 

The  potential  predictors  are  identical  for  all  of  the  14 
regression  equations,  with  the  exception  of  any  predictors 
that  are  a  function  of  atmospheric  level.  Predictors  are 
sought  to  assess  quantitatively  the  effect  of  three  different 
features  on  storm  movement:  external  (to  the  storm)  physical 
forcing,  previous  movement  of  the  storm,  and  storm  intensity. 
Synoptic  (and  sub- synoptic)  external  forcing  on  the  storm  is 
thought  to  play  a  large  role  on  storm  movement  (Brown,  1981 
and  others)  .  To  incorporate  the  forcing  quantitatively,  the 
orthogonal  coefficients  associated  with  the  10  retained  eigen¬ 
vectors  for  a  particular  data  case  are  selected  as  potential 
predictors.  One  of  the  primary  objectives  in  this  study  is 
to  determine  how  well  these  EOF's  represent  large  scale 
features . 

If  the  3torm  is  to  be  forecast  properly,  prior  motion  must 
also  be  accounted  for  (Peterson,  1980) .  It  is  necessary  to 
know  toward  which  direction  the  storm  is  moving  to  determine 
what  portion  of  the  external  forcing  will  be  affecting  the 
storm.  To  do  this,  twelve  additional  variables  representing 
past  zonal  and  meridional  displacements  are  added  to  the  set 
of  potential  predictors.  All  of  the  prior  storm  displacements 
are  based  on  warning  positions  to  simulate  operational 


conditions.  The  six  variables  for  zonal  motion  are  the  prior 
12,  24  and  36  hour  zonal  displacements  of  the  storm,  the  zonal 
displacements  from  12  hours  to  24  and  36  hours  prior,  and 
finally  the  zonal  displacements  from  24  to  36  hours  prior  to 
the  base  time.  The  time  frames  for  the  meridional  displace¬ 
ments  are  identical. 

Storm  intensity  is  the  third  storm  characteristic  sought 
to  assess  quantitatively.  The  most  preferable  form  of  this 
data  would  be  a  meso-  or  microscale  analysis  of  the  winds  around 
the  storm.  Since  this  is  not  available,  the  JTWC  warning 
maximum  winds  are  used  to  indicate  intensity.  The  intensity 
data  are  available  for  the  base  time,  and  at  12,  24  and  36 
hours  prior  to  base  time.  Therefore,  the  complete  set  of 
potential  predictors  includes  four  predictors  for  intensity, 

12  for  past  movement  and  10  for  the  physical  forcing.  Table 
5-2  is  a  listing  of  the  26  potential  predictors,  along  with 
the  names  used  to  identify  each  predictor  in  this  study.  For 
a  data  case  to  be  used  in  the  formulation  of  the  regression 
equations,  a  complete  set  of  potential  predictors  and  the 
proper  preaictand  had  to  be  available.  This  decreased  the  num¬ 
ber  of  cases  available  for  computation  of  the  regression  equa¬ 
tions.  Actual  valid  case  numbers  are  presented  with  the 
results  of  the  regression.  Since  the  number  of  potential 
predictors  is  initially  large,  the  resultant  equations  need 
to  be  examined  carefully  to  determine  if  any  of  these  pre¬ 
dictors  may  be  excluded  with  little  information  loss.  It  is 


TABLE  5-2 


Potential  aredictors  used  to  develop  the  regression 
equations.  The  first  ten  predictors  are  different  for  each 
or  the  three  pressure  levels. 


POTENTIAL  PREDICTOR  NAME  DESOtlPTIOH 

7 ARIABLE  NUMBER 


~T 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 
17 
10 

19 

20 
21 
22 

23 

24 

25 

26 


coFT” 

cof  2 

cof  3 

C  of  4 

cof5 

cof6 

co£7 

cof8 

cof9 

cof  10 

plat  1 

plat  2 

plat  3 

plat4 

plat5 

plat6 

plan  1 

plon2 

plon3 

plon4 

ploc5 

plon6 

a  awO 

aawl 

a*w2 

aaw3 


_  with  eigenvector  1. 

The  orthogonal  coefficient 
associated  with  eigenvector  2. 
The  orthogonal  coefficient 
associated  with  eigenvector  3. 
The  orthogonal  coefficient 
associated  with  eigenvector  4. 
The  orthogonal  coefficient 
associated  with  eigenvector  5. 
The  orthogonal  coefficient 
associated  with  eigenvector  6. 
The  orthogonal  coefficient 
associated  with  eigenvector  7. 
The  orthogonal  coefficient 
associated  with  eigenvector  8. 
The  orthogonal  coefficient 
associated  with  eigenvector  9. 
The  orthogonal  coefficient 
associated  with  eigenvector  10. 
Store  latitude  aoveeent 
for  12  hours  before  base  tiee. 
Store  latitude  aoveeent 
for  24  hogrs  before  base  tiee. 
Store  latitude  aoveeent 
for  36  hours  before  base  tiee. 
Store  latitude  aoveeent  froa 
24  to  12  hours  before  base  tiee. 
Store  latitude  eoveeent  froe 
36  to  12  hours  before  base  tiae. 
Store  latitude  aoveeent  froa 
36  to  24  hours  before  base  tiae. 
Store  longitude  eoveeent 
for  12  hours  before  base  tiae. 
Store  longitude  aoveeent 
for  24  hours  before  base  tiee. 
Store  longitude  aoveeent 
for  36  hours  before  base  tiae. 
Store  longitude  aoveeent  froe 
24  to  12  Sours  before  base  tiae. 
Store  longitude  aoveeent  froa 
36  to  12  nours  before  base  tiae. 
Store  longitude  aoveeent  froa 
36  to  24  Sours  before  base  tiae. 
Stora  warning  aaxiaua  wind  at 
forecast  base  tiae. 

Stora  warning  aaxiaua  wind  12 
hours  prior  to  base  tia$. 

Store  warning  aaxiaua  wind  24 
hours  prior  to  case  tiee. 

Stora  warning  aaxiaua  wind  36 
hours  prior  to  base  tiae. 


91 


desirable  to  have  as  few  potential  predictors  as  possible. 
Therefore,  if  it  is  determined  that  any  of  the  potential 
predictors  add  little  to  the  equations,  they  should  be  dropped 
from  the  developmental  set,  and  the  equations  should  be 
rederived  over  the  smaller  set  of  predictors. 

The  next  decision  is  how  to  use  the  predictors  to  create 
the  equations.  Two  primary  possibilities  exist:  all  possible 
predictors  or  stepwise  regression.  All  possible  predictor 
regressions  use  all  predictors  at  once  to  form  the  regression 
equations.  In  this  study,  all  26  predictors  would  be  used 
to  formulate  the  equations.  A  stepwise  regression  creates 
the  regression  equations  by  adding  (or  deleting)  one  predictor 
per  step.  At  each  step,  the  single  predictor  that  is  most 
highly  correlated  with  any  residual  error  from  the  previous 
step  is  added  to  the  predictors  used,  and  the  equations  (and 
residuals)  recomputed.  This  process  continues  until  no  addi¬ 
tional  predictors  meet  a  pre-assigned  significance  tolerance 
level.  Dixon  and  Brown  (1979)  give  further  details  of  the 
procedure.  Typically,  not  all  potential  predictors  are  used. 

A  stepwise  screening  procedure  is  used  here  for  two  funda¬ 
mental  reasons.  First,  a  stepwise  procedure  extracts  maximum 
information  out  of  minimum  variables,  and  variables  that  add 
little  information  are  not  used.  Second,  and  more  impor¬ 
tantly,  Neter  and  Wasserman  (1974)  show  that  if  two  or  more 
potential  predictors  are  highly  correlated,  retention  of  both 
may  have  a  deleterious  effect  on  interpretation  of  the  equations. 
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The  problem  is  called  multicoll inear ity.  Statistically,  the 
effect  is  to  have  little  additional  reduction  in  the  total 
explained  variance,  while  decreasing  the  degrees  of  freedom 
in  the  equation.  Since  at  least  some  of  the  potential  predic¬ 
tors  are  highly  correlated,  multicollinearity  could  be  a  prob¬ 
lem.  By  using  a  stepwise  regression  approach,  the  problem  is 
circumvented.  Whenever  a  stepwise  regression  scheme  is  used, 
a  decision  on  how  many  predictors  are  to  be  used  needs  to  be 
made.  Two  possible  approaches  are  to  use  a  predetermined  num¬ 
ber  of  predictors,  so  that  the  number  of  terms  in  each  final 
equation  are  identical,  or  to  use  all  terms  that  meet  a  pre¬ 
determined  significance  tolerance  level.  For  this  study, 
all  predictors  that  significantly  reduce  the  variance  are 
included  in  the  equations,  so  that  the  number  of  terms  in  the 
various  equations  differs.  A  tolerance  level  (F-ratio)  of 
4.0  is  used  for  this  study  (Dixon  and  Brown,  1979). 

Finally,  the  form  of  the  equations,  either  linoar  or 
polynomial,  must  be  decided.  The  simplest  type  of  polynomial 
regression  involves  using  all  first-order  predictors,  and 

I 

nonlinear  combinations  of  the  first-order  predictors  in  the 
model.  For  instance,  if  there  are  10  initially  defined  poten¬ 
tial  predictors,  then  the  set  of  predictors  used  in  polynomial 
regression  include  all  10  first  order  terms,  all  10  second 
order  (squared)  predictors,  plus  the  45  nonlinear  products  of 
all  potential  predictors.  The  use  of  polynomial  regression 
may  occasionally  be  of  aid  in  fitting  the  predictors  tc  the 
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predictands  when  nonlinear  cause  and  effect  is  anticipated. 
Neumann  and  Leftwich  (1977)  use  a  second  order  polynomial 
regression  to  forecast  typhoon  movement,  although  their  pre¬ 
dictors  do  not  include  synoptic  forcing  explicitly.  With  26 
potential  predictors,  as  in  this  study,  the  number  of  poly¬ 
nomial  predictors  becomes  unwieldy.  A  further  justification 
for  not  using  polynomial  regression  is  that  the  predictands 
give  no  evidence  of  interacting  nonlinearly  with  the  predictors. 

In  summary,  14  linear  regression  equations  are  to  be  formu¬ 
lated  for  each  atmospheric  pressure  level,  with  predictands 
being  12-  through  34-h  zonal  and  meridional  displacements 
(in  nautical  miles)  in  12-hour  increments.  Predictors  will 
be  selected  stepwise  from  a  set  of  26  potential  predictors 
over  454  (or  fewer)  dependent  data  cases.  50  cases  have  been 
held  back  to  test  the  equations. 

The  regression  equations  are  calculated  using  the  Univer¬ 
sity  of  California  BMDP  computer  routine  linear  stepwise 
regression  (Dixon  and  Brown,  1979) .  Before  presenting  the 

equations,  their  ability  to  explain  variation  in  the  predic- 

2 

tand  is  examined  by  use  of  R  statistic.  This  quantity  may 

be  interpreted  as  the  percent  explained  variance  in  the  pre- 

dictand  by  the  regression  equation  (usiiig  the  dependent  data 
2 

cases)  .  The  R  value  for  each  regression  equation  is  shown 
in  Table  5-3. 

2 

Several  properties  are  immediately  seen  from  the  R  values. 
First,  the  zonal  equations  appear  to  explain  a  greater  portion 


TABLE  5-3 


2 

Saople  size  and  8  statistic  for  each  zonal  and  aeridional 
regression  equation  by  forecast  tiae  and  ataosphczic  level. 

FORECAST  INTERVAL  (HR) 

12  24  36  48  60  72  84 

NUMBER  OF 

DEPENDENT  351  351  329  256  233  163  150 

DATA  CASES 


ZONAL  EQUATIONS 


500ab 

.794 

.725 

.685 

.613 

.568 

.556 

.444 

7  00  ab 

.791 

.719 

.680 

.600 

.558 

.550 

.310 

850ab 

.784 

.712 

.651 

.571 

.519 

.535 

.384 

MERIDIONAL  EQUATIONS 


500»b 

.522 

.476 

.404 

.354 

.255 

.315 

.208 

700ab 

.54  0 

.486 

.419 

.347 

.285 

.  252 

.184 

850ab 

.502 

.46  3 

.365 

.323 

.255 

.259 

.103 
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of  the  total  (zonal)  movement  variation  than  do  the  meridional 

equations.  Over  75%  of  the  total  (zonal)  variation  in  the 

12-h  movement  is  explained  by  the  equations  at  each  of  the 

three  atmospheric  levels.  The  maximum  meridional  variation 

explained  (54%)  is  for  the  12-h  movement  using  700mb  EOF 

coefficients.  Matching  forecast  times  and  levels  (excluding 

2 

the  84  hour  forecast  from  the  700mb  equations) ,  the  zonal  R 

2 

is  always  at  least  .24  greater  than  the  meridional  R  for  the 
same  time  period  and  level.  The  increased  ability  of  the  zonal 
equations  is  expected  because  there  is  greater  variation  in 
the  zonal  movement  than  the  meridional  movement.  The  means 
and  standard  deviations  of  the  zonal  and  meridional  displace¬ 
ments  at  the  various  forecast  times  are  shown  in  Table  5-4. 


TABLE  5-4 

Means  and  standard  deviations  of  the  predic- 
tands  (in  nautical  miles)  for  the  dependent 
sample.  See  text  for  details. 


FORECAST  TIME  (HOURS) 


12 

24 

36 

48 

60 

72 

84 

Meridional 

displacement 

mean 

56 

119 

181 

223 

282 

316 

353 

standard 

deviation 

(50) 

(100) 

(150) 

(165) 

(221) 

(230) 

(267) 

Zonal 

displacement 

mean 

51 

93 

129 

195 

225 

307 

372 

standard 

(81) 

(176) 

(258) 

(309) 

(376) 

(396) 

(449) 

deviation 


96 


The  mean  movement  for  both  directions  is  roughly  the  same 

magnitude,  and  indicates  an  average  track  .oward  the  north- 

west.  A  more  significant  difference  in  the  motion  is  seen  in 

the  standard  deviations,  which  are  larger  for  the  zonal  motion 

than  for  the  meridional  motion.  As  both  the  zonal  and  merid- 

ional  components  contribute  approximately  the  same  error 

2 

magnitude  in  the  regression  equations,  une  R  for  the  zonal 
motion  will  be  significantly  greater  since  there  is  more 
variance  to  be  explained. 

2 

The  second  property  seen  immediately  in  the  R  values  in 

Table  5-3  is  that  they  decrease  rapidly  in  time  for  each 

pressure  level.  For  the  500mb  equations,  a  general  rule  of 

2 

thumb  is  that  the  R  decreases  by  a  value  of  .05  per  12  hour 

increment.  It  is  further  seen  (Table  5-4)  that  the  standard 

deviation  of  displacement  increases  every  12  hours,  heighten- 

2 

ing  the  significance  of  the  decrease  of  the  R  in  time.  Simply 

stated,  the  equations  predict  movement  well  in  the  short  term, 

but  the  errors  grow  rapidly  with  increasing  time. 

2 

The  final  property  seen  in  the  R  values  is  that  the 

accuracy  of  the  equations  is  not  a  strong  function  of  the 

atmospheric  level  in  the  dependent  sample  case.  The  500mb 
2 

R  values  are  generally  larger  than  at  the  other  two  levels, 

although  these  differences  are  not  significant.  A  Student's 

t-test,  assuming  non-identical  variacnes  in  the  population, 

was  conducted  with  the  null  hypothesis  that  there  is  no 

2 

significant  difference  in  the  R  values  at  the  various  levels. 
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In  no  case  was  the  test  statistic  significant  at  even  the 
alpha  equal  .75  level.  Therefore,  the  null  hypothesis  is 
accepted  that  over  the  dependent  sample  there  is  no  differ¬ 
ence  in  performance  of  the  equations  at  the  different  atmos¬ 
pheric  levels. 

Tables  5-5  and  5-6  present  the  regression  coefficients 
of  the  500mb  equations  by  direction  of  movement.  For  example, 
the  500mb  meridional  regression  coefficients  for  all  seven 
forecast  times  are  given  in  Table  5-5.  The  first  value  given 
is  the  intercept.  The  final  regression  equation  prediction 
of  displacement  is  obtained  by  summing  over  the  product  of 
all  non-zero  regression  coefficients  and  the  variable  asso¬ 
ciated  with  the  coefficient.  None  of  the  500mb  equations 
use  more  than  10  predictors.  In  seven  of  the  28  equations, 
six  or  fewer  predictors  are  used.  Therefore,  these  equations 
are  very  simple  to  use.  A  past  movement  variable  was  always 
the  first  variable  selected  in  the  stepwise  procedure,  so 
persistence  does  play  a  role  in  the  predicted  movement.  The 
predictions  are  not  simply  persistence  forecasts,  however, 
since  in  general  four  or  five  EOF  coefficient  predictors  are 
chosen  in  each  equation.  Therefore,  forcing  also  plays  a 
crucial  role  in  the  storm  movement.  Finally,  maximum  wind 
predictors  are  of  little  consequence  in  the  final  equations, 
indicating  little  impact  on  the  12-h  (or  greater)  time  scale 
storm  motion  (excluding  short  term  trochoidal  path  oscillation) . 
The  resultant  equations  for  the  700  and  850mb  data  are  shown 
in  Appendix  B.  It  is  also  noted  that  of  the  potential 
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TABLE  5-5 


proceiura 


POR2CAST  VALID  FOB  BASE  TIHE  BLOS  HOURS 


Intercept 
Cof  1 
Cof  2 
Cof  3 
Cof4 
Cof  5 
Cof  6 

Cof] 

Cof  8 

Cof  9 

Cof  10 

Platl 

Plat  2 

Plat  3 

Plat4 

Plat  5 

Plat  6 

Plon  1 

Plor.2 

Plon3 

Plon4 

?lor.5 

Plon6 

AmwO 

Aawl 

Amv2 

Aow3 


38.334 

.0 

-2.234 

3.848 

-2.641 

.0 

-2.535 

3.182 

.0 

-2*618 

0*.358 

.0 

-0.236 
.  0 

.0  , 

0.246 

-0.038 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

.0 


70.789 
.  0  „ 
-4.435 
8.781 
-5.799 

-5! 279 

6;E°2 

-8*975 

0.634 

.0 

.0 

.0 

.0 

.0 

0.502 
-0. 158 
•  0 
.0 
.0 
.0 

.0  A 
0.319 
.  0 
.0 


723.11a 

-2.886 

-6.148 

13.491 

-8.169 

.0 

-7.191 

10.^90 

*.0 

-18.292 

0.652 


.0 

.0 

0*.257 
.0 
.0 
.0 
•  0 
.o 
.0 

0.518 

.0 

.0 


117. 334 
.  0 

-5.649 
12.635 
.0 
.0 
.  0 

17.320 
.  0 

12.293 
-16. 197 
1.068 

:} 

:8 

.0 

.0 

.0 

.0 

•8 

•2 

•8 
*  o 

0.685 

.0 

.0 


214.492 

-6*633 

12.677 

.0 

.0 

-14.631 

26.948 

.0 

18.113 

1 1  aos 

:°o 

:i 

.0 

•o 

.0 

.0 

•8 

•8 


,al: tII 

‘ll  *.923 
.0 
.0 
.0 

20.428 
.  0 

24. 462 

.0 

1.247 

•2 

*8 

•8 

.0 

:8 

0*.  197 

.0 

•8 

•2 

.0 

1.351 
.  0 
.0 


297.612 

20.376 

*2 

.0 

.0 

.0 

.0 

28.  487 
.0 
.0 

0.656 
.  0 
.0 
.0 
.0 
.0 
.0 
.0 

o!  332 

:8 
.  0 
.0 
.0 
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TAoLE  5 


Regression  coefficients  for  the  coven  zonal 
equations  using  SOOnb  BOP's.  A  value  of  .0  indicates 
the  predictor  was  not  selected  in  the  stepwise  selection 
procedure. 


POSBCAST  VALID  F03  BASS  TIBS  PLUS  HOURS 


11 

24 

36 

48 

60 

72 

84 

ntercept 

16.027 

37.064 

36.833 

105.903 

216.515 

168.503 

286.96? 

Cof ' 

2.678 

6. 6o4 

13.668 

18.466 

26.153 

19. 369 

27.91!. 

cof.' 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Cof  3 

.0 

.0 

-6.104 

.0 

.0 

.0 

.0 

Cof  4 

-3.635 

-7.783 

-11.698 

-21.928 

-32.626 

-48.063 

-51.194 

Ccf5 

4.239 

8.  460 

12.385 

11.074 

.0 

24.253 

32.448 

Cof  6 

.0 

.0 

.0 

.0 

.  0 

.0 

Cof  7 

Cof  8 

-7^434 

-121328 

-22l?32 

-221S&1 

-26  *.982 

.0 

-41.319 

-58.’ 377 

Cof  9 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Cof  10 

.0 

.0 

13.350 

.0 

.0 

34.037 

.0 

Plat  1 

.0 

.0 

.0 

-0.758 

-1.058 

.0 

.0 

Plat2 

.0 

.0 

.C 

.0 

.0 

-0.660 

-0.836 

Plat  3 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plata 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plat5 

.0 

.0 

.c 

.0 

.0 

.0 

*9 

PlatS 

.0 

-0. 234 

.0 

•2 

.0 

•2 

.  0 

Plonl 

-0.626 

-1.232 

-1.593 

-1.782 

-1.919 

-1.798 

-1.542 

Plon2 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

?lon3 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plor.4 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

PlonS 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plon6 

.0 

.0 

.0 

.  0 

.0 

.0 

.0 

AavG 

.0 

.0 

.0 

.0 

.0 

2.  179 

.0 

Amvl 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Anv2 

.0 

.0 

.0 

.0 

-1.165 

.0 

.0 

Aav3 

.0 

.0 

.0 

.0 

.0 

-2.  113 

.0 
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predictors,  very  little  information  would  be  lost  by 
excluding  all  past  displacement  variables  except  for  the 
12-h  period  prior  to  base  time.  Additionally,  of  the 
intensity  predictors,  the  most  frequently  selected  was  the 
12  hour  prior  intensity.  Therefore,  it  was  decided  to  re¬ 
derive  the  equations  using  only  13  potential  predictors 
(the  10  coefficients  at  the  given  level,  Platl,  Plonl  and 
Amwl) .  Results  of  the  equations,  in  the  form  of  R4-  statis¬ 
tics,  derived  on  the  smaller  set  are  given  in  Appendix  3. 

The  remainder  of  the  results  presented  in  this  chapter  refer 
to  the  equations  derived  using  the  complete  set  of  all  26 
potential  predictors. 

Results  presented  thus  far  have  been  drawn  from  the 
regression  equations  using  the  dependent  data  set.  A  true 
test  of  a  regression  equation  comes  through  testing  with 
independent  data.  This  testing  is  critical  in  determination 
of  accuracy  of  the  model.  The  JTWC  annual  typhoon  report 
publishes,  in  addition  to  best  track  and  warning  positions, 
the  forecast  errors  for  24,  48  and  72  hour  forecasts.  The 
regression  model  was  tested  with  the  independent  data  and 
is  compared  to  the  official  JTWC  forecast  error,  which 
serves  as  a  benchmark.  Of  the  50  independent  cases,  only 
45  have  JTWC  official  forecasts  at  24  hours,  31  have  offi¬ 
cial  forecasts  at  48  hours  and  only  17  at  72  hours.  Admit¬ 
tedly,  the  sample  size  of  the  independent  storms  is  quite 
small,  but  inferences  on  aptness  of  the  model  may  still  be 


drawn.  Both  the  complete  set  of  results  for  the  independent 
storms,  and  the  homogeneous  set  where  both  JTWC  and  the 
regression  model  errors  are  available  will  be  shown. 

The  overall  performance  (Table  5-7)  of  the  regression 
equations  on  the  entire  set  of  50  independent  cases  is  first 
examined  to  determine  if  there  is  consistency  in  the  fore¬ 
casts  (indicated  by  small  standard  deviations)  and  to  deter¬ 
mine  in  general  how  well  the  equations  forecast  the  motion. 


! 

} 

TABLE  5-7 

Mean  and  standard  .eviation  forecast  vector 
(nautical  miles)  of  24,  48  and  72  hours  for 
set  of  50  independent  storms. 

error 

the 

HOUR  FORECAST 

24 

48 

72 

Sample  size 

50 

43 

36 

500mb  forecast  error 
mean 

standard  deviation 

88.4 

62.5 

176.4 

113.5 

277.4 

167.4 

700mb  forecast  error 
mean 

standard  deviation 

110.1 

91.3 

189.3 

120.5 

318.7 

178.7 

850mb  forecast  error 
mean 

standard  deviation 

114.9 

105.8 

205.4 

146.1 

358.0 

219.2 

The  500mb  equations  outperformed  the  other  two  equation  sets 
by  a  wide  margin,  which  is  surprising.  Similar  differences 
between  levels  did  not  appear  in  the  errors  of  the  dependent 
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sample,  given  in  Table  5-8.  A  possible  explanation  is  that 
there  is  a  greater  variation  in  the  synoptic  forcing  fields 
at  500mb.  This  allows  the  500mb  equations  to  be  less  suscep¬ 
tible  to  large  forecast  errors  in  cases  where  the  predictors 
have  extreme  values.  It  turns  out  that  with  few  exceptions, 
the  700mb  errors  are  similar  to  the  500mb  errors.  Where  the 
700mb  equations  performed  poorly,  the  results  were  much 
worse  than  the  500mb  equations.  Therefore,  it  appears  that 
(at  least  over  thr  independent  cases)  the  500mb  equations 
have  a  smaller  likelihood  to  give  a  large  forecast  error. 

This  hypothesis  needs  to  be  tested  more  thoroughly  as  addi¬ 
tional  data  becomes  available. 


TABLE 

5-8 

■ 

Mean  and  standard  deviation  forecast  vector 
(nautical  miles)  of  24,  48  and  72  hours  for 
set  of  454  dependent  storms. 

error 

the 

FORECAST 

INTERVAL 

24 

48 

72 

Sample  size 

351 

255 

164 

500mb  forecast  error 
mean 

standard  deviation 

91.5 

72.7 

203.3 

113.7 

298.7 

152.4 

700mb  forecast  error 
mean 

standard  deviation 

92.6 

71.9 

210.6 

115.8 

293.7 

121.5 

850mb  forecast  error 
mean 

standard  deviation 

95.2 

71.6 

210.7 

121.5 

383.4 

232.2 

103 


The  next  step  in  examination  of  the  independent  data 
results  is  to  compare  the  results  of  EOF  regression  forecasts 
to  the  official  JTVJC  forecasts,  for  those  cases  that  this  is 
possible.  The  mean  and  standard  deviation  errors  for  these 
valid  cases,  and  the  benchmark  JTWC  official  forecast  error 
statistics  are  shown  in  Table  5-9.  A  superior  500mb  scheme 
is  again  evident.  More  importantly,  it  is  seen  the  standard 
deviation  of  error  for  the  EOF  regression  scheme  is  less 
than  for  the  JTWC  official  forecasts,  which  indicates  the 
EOF  regression  scheme  is  less  likely  to  have  a  large  forecast 
error.  The  combination  of  small  mean  error  and  small  standard 
deviation  indicates  the  EOF  scheme  outperforms  the  JTWC 
official  forecast.  The  700  and  850mb  equation  forecasts  were 
again  poorer  than  the  500mb  forecasts,  and  appear  to  be  about 
equal  to  the  JTWC  forecasts. 

Finally,  the  EOF  regression  scheme  is  compared  to  the 
JTWC  official  forecast  on  a  case-by-case  basis  in  Figs.  5-1 
through  5-9.  Any  points  lying  above  the  straight  line  on 
the  graphs  represent  cases  in  which  the  EOF  scheme  out¬ 
performed  the  JTWC  official  forecasts.  The  850mb  results 
(Figs.  5-3,  5-6  and  5-9)  show  little  differences  between  the 
schemes.  The  700mb  equations  (Figs.  5-2,  5-5  and  5-8)  show, 
in  general,  a  better  forecast  by  the  EOF  scheme,  as  a  bulk 
of  the  points  lie  above  the  no  difference  line.  The  overall 
comparison  statistics  appear  to  have  been  affected  by  a  few 
large  forecast  errors,  especially  at  24  hours.  This  tendency 
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24  HOUR  850MB  ERROR 


Fig.  5-3.  Similar  to  Fig.  5-1,  except  the  850mb  EOF 

regression  forecast  is  compared  to  JTWC  official 
forecast  for  a  24-hour  forecast. 


'  48  HOUR  500M3  ERROR 

Fig.  5-4.  Similar  to  Fig.  5*1,  except  the  5QGmb  EOF 

regression  forecast  is  compared  to  JTWC  official 
forecast  for  a  48-hour  forecast. 


107 


Fig.  5 


5.  Similar  to  Fig.  5-1,  except  the  lOQmbEOV f .  . al 
regression  forecast  is  compared  to  JTWC  official 
forecast  for  a  48-hour  forecast. 
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Fig.  5-6. 


•00. 


Fig.  5-7.  Similar  to  Fig.  5-1,  except  the  SOOmb  EOF 

regression  forecast  is  compared  to  JTWC  official 
forecast  for  a  72-hour  forecast. 


Fig.  5-8.  Similar  to  Fig.  5-1,  except  the  700mb  EOF 

regression  forecast  is  compared  to  JTWC  official 
forecast  for  a  72-hour  forecast. 
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72  HOUR  JTWC  ERROR  * 


72  HOUR  850M3  ERROR 
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toward  large  errors  does  not  appear  as  dramatically  in  the 
500mb  forecasts  (Figs.  5-1,  5-4  and  5-7).  The  superiority 
of  the  EOF  forecasts  to  the  JTWC  official  forecasts  needs 
to  be  examined  over  a  larger  set  of  independent  data. 

One  final  point  of  interest  on  these  figures  is  that 
both  the  4 8 -hour  850mb  and  7 2 -hour  700mb  forecasts  have  an 
unusually  shaped  clustering  of  EOF  regression  errors  at 
about  the  150  n  mi  error  level.  No  physical  explanation 
for  this  clustering  is  known.  It  is  very  likely  the  event 
is  an  artifact  of  the  data.  It  is,  nevertheless,  interesting, 
and  worth  closer  examination  as  more  data  become  available. 

A  final  graphical  representation  of  the  differences  in 
forecasting  methods  is  shown  in  Figs.  5-10  through  5-12. 

These  graphs  are  divided  by  atmospheric  level,  and  on  each 
are  the  JTWC  error  over  the  independent  sample,  the  EOF 
regression  forecast  over  the  complete  and  homogeneous  inde¬ 
pendent  sample  as  well  as  the  EOF  forecast  over  the  dependent 
sample  plotted  as  a  function  of  forecast  time.  Once  again, 
the  EOF  regression  scheme  forecast  appears  superior  over  both 
the  short  and  long  term  for  the  500mb  equations. 


Ill 


FORECAST  ERROR  -500MB 


Pig. 


5-10.  Comparison  of  the  JTWC  official  forecast 
over  the  independent  data  set,  as  well  as 
the  complete  and  homogeneous  independent 
EOF  regression  set  and  the  dependent  set 
errors.  All  EOF  results  computed  from 
500mb  equations. 


112 


100. 


HOUR  FORECAST 

Fig.  5-11.  Similar  to  Fig.  5-10,  except  EOF  regression 
results  obtained  from  700mb  equations. 


O  JTWC  FORECAST 
A  HOMOGENEOUS  INDEPENDENT  SET 
C  INDEPENDENT  SET 


HOUR  FORECAST 

Fig.  5-12.  Similar  to  Fig.  5-10,  except  EOF  regression 
results  obtained  from  850mb  equations. 


VI .  POTENTIAL  FOR  USE  WITH  INDEPENDENT  DATA 


Based  on  the  results  of  the  previous  section,  it  appears 
that  EOF  regression  forecasting  has  potential  for  improving 
forecasts  of  tropical  storm  movement.  Using  a  limited  inde¬ 
pendent  data  set,  the  method  has  been  shown  to  be  an  improve¬ 
ment  on  the  JTWC  official  forecasts.  There  are  still 
unanswered  questions  concerning  use  of  the  model  operationally 
on  independent  storms.  The  regression  equations  were  derived 
using  orthogonal  coefficients  derived  from  one  set  of  eigen¬ 
vectors.  The  regression  equations  derived  are  strictly  valid 
only  for  tropical  cyclone  cases  in  which  the  coefficients 
are  obtained  from  these  identical  vectors,  so  that  the  coef¬ 
ficients  have  a  consistent  meaning  for  each  storm.  If  a  new 
case  is  added  to  the  dependent  set,  the  set  of  vectors  no 
longer  exactly  explains  the  maximum  variate  all  of  the 

observations.  Therefore,  the  stability  of  the  eigenvectors 
and  coefficients  must  be  examined  by  determining  whether  the 
vectors  and  coefficients  remain  nearly  the  same  if  additional 
cases  are  added.  This  stability  will  be  examined  theoretical¬ 
ly,  and  by  a  simplified  experiment. 

The  set  of  dependent  eigenvectors  is  defined  as  those 
vectors  obtained  from  the  original  data  set.  Independent 
vectors  are  obtained  from  the  combined  set  of  original 
dependent  cases  plus  the  new  independent  case.  If  the 
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eigenvectors  for  the  dependent  data  set  are  very  close  to 
the  eigenvectors  for  the  independent  set,  then  little  error 
will  be  introduced  by  using  the  dependent  eigenvectors  to 
compute  the  coefficients  for  the  independent  case.  In  this 
case,  the  independent  case  coefficients  may  be  used  directly 
in  the  regression  equations  as  initially  derived.  If  the 
eigenvectors  are  not  consistent,  the  regression  equations 
must  be  re-derived  for  every  new  forecast,  including  the 
recomputation  of  a  new  set  of  eigenvectors  and  coefficients 
using  all  data  cases.  Because  of  the  large  amount  of  compu¬ 
tation  in  this  case,  it  is  highly  desirable  that  the  coeffi¬ 
cients  and  vectors  are  consistent  for  independent  data. 

As  in  Chapter  III,  the  eigenvectors  are  derived  from 
solving  the  eigenvector  equation  using  the  known  matrix  R, 
where  R  is  the  correlation  matrix  of  the  normalized  grid 
points: 

B  *  S  S'  N*1  .  (1) 

R  is  a  square  matrix  of  order  equal  to  the  number  of  dimen¬ 
sions  (grid  points) ,  M.  The  set  of  eigenvectors  constructed 
over  the  dependent  sample  should  theoretically  be  stable  if 
N  (number  of  individual  cases)  is  large.  That  is,  addition 
of  a  single  independent  case  should  have  very  little  effect 
on  the  shape  of  the  observation  surface  in  space.  Inclusion 
of  an  additional  data  case  changes  R  by: 
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Snew 


N 

N+l 


*OLD 


*— -*  a  a 
N+l  -  - 


(2) 


where  E^ew  is  the  new  (independent)  correlation  matrix  after 
addition  of  the  new  observation  case,  I0LD  is  t*le  ori9inai 
(dependent)  correlation  matrix,  N  (N+l)  the  number  of  cases 
prior  to  (after)  inclusion  of  the  new  case,  and  a  is  the 
(M  X  1)  vector  of  normalized  D-values  for  the  independent 
case.  If  N  is  initially  very  large,  the  term  a  a*  in 
(2)  is  negligible  compared  to  the  first  term,  since  the 
normalized  observation  elements  are  rarely  greater  than  two 
or  three.  Therefore,  to  a  very  close  approximation. 


§new  ~  §old  '  (3) 

and  the  eigenvalues  and  vectors  obtained  from  the  dependent 
data  should  be  almost  identical  to  those  obtained  over  all 
cases. 

The  above  theory  was  tested  with  500mb  data  using 
dependent  samples  of  N  *  50,  100,  150,  200,  300,  and  400 
cases  with  33  independent  cases.  The  33  independent  case 
orthogonal  coefficients  were  computed  in  two  ways: 

(1)  As  a  control,  the  independent  cas.  was  added  to  the 
dependent  sample,  §  computed,  and  the  true  eigenvectors  and 
orthogonal  coefficients  recalculated.  Therefore,  33  separate 
sets  of  eigenvectors  were  computed.  The  eigenvectors  and 
orthogonal  coefficients  are  the  values  that  minimize  the 
deviation  from  the  mean  state  for  all  of  the  data. 
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(2)  The  test  method  involved  computing  the  eigenvectors 
only  once  from  the  dependent  set  (N  cases) .  These  vectors 
were  then  used  to  compute  the  orthogonal  coefficients  for 
the  independent  cases.  If  regression  equations  are  not  to 
be  re-derived  for  every  new  operational  forecast,  the  coeffi¬ 
cients  in  the  test  method  should  be  nearly  identical  to 
those  from  the  control. 

Method  (2)  requires  considerably  less  computer  time; 
however  the  question  is  whether  the  coefficients  are  suffi¬ 
ciently  accurate.  Only  the  first  ten  coefficients  are 
examined  since  they  represent  the  primary  contribution  to 
the  500mb  height  fields.  The  comparison  for  the  first  four 
coefficients  are  shown  in  Figs.  6-1  through  6-4.  The 
quantity 


Y.  *  ABSOLUTE  VALUE  (Cof.  -  Cof .  )  (3) 

X1  X2 


is  summed  over  the  33  independent  cases.  Cof.  is  the  ith 

coefficient  (1  to  10)  computed  using  method  (1)  and  Cof. 

x2 

is  the  ith  coefficient  computed  using  method  (2)  .  The  first 
two  moments  of  Y^  are  examined  to  determine  the  stability  of 
the  coefficients.  As  N  increases,  the  standard  deviations 
of  the  differences  in  the  coefficients  should  become  smaller. 

The  expected  "funnel-shape"  with  increasing  N  is  seen 
clearly  in  the  first  orthogonal  coefficient  (Fig.  6-1) , 
while  coefficients  2  and  3  (Figs.  6-2  and  6-3)  tend  to  have 
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.  6-1  except  for  coefficient  2. 
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Fig.  6-2.  Similar  to  Fig 


Fig.  6-3.  Similar  to 


Fig.  6-4.  Similar  to 


the  expected  shape  only  for  N  greater  than  100.  For  the 
N  *  50  case  the  mean  error  for  both  coefficients  2  and  3 
is  very  large  compared  to  the  coefficient  size  (normally 
less  than  ten) .  This  indicates  the  first  three  coefficients 
may  be  derived  from  the  dependent  set  of  eigenvectors  deter¬ 
mined  from  as  few  as  100  cases.  An  unexpected  result  is 
found  with  the  fourth  coefficient  (Fig.  6-4),  when  N  *  400 
(also  at  N  *  100)  .  The  large  standard  deviation  indicates 
that  at  least  some  of  the  independent  cases  have  very  large 
error  in  this  coefficient.  A  similar  indication  of  unstable 
coefficients  also  occurs  in  the  sixth,  seventh  and  eighth 
coefficients. 

The  source  of  the  error  in  the  calculation  of  the  coeffi¬ 
cients  was  found  to  be  due  to  the  structure  of  the  charac¬ 
teristic  equation.  Any  single  vector  that  is  a  solution 
eigenvector  additionally  represents  infinite  other  vectors 
that  are  also  solutions,  and  which  differ  only  by  a  constant 
scaling  factor  (positive  or  negative).  In  EOF  analysis,  the 
coefficients  depend  upon  the  numerical  values  (and  signs)  of 
the  eigenvectors.  If  one  or  two  of  the  vectors  change  signs 
during  numerical  solution  of  the  eigenvectors,  then  the 
coefficients  must  also  reverse,  which  changes  the  EOF 
reconstruction.  It  is  important  to  notice  that  the  sign 
reversal  actually  occurs  in  deriving  the  new  eigenvectors 
when  the  new  independent  case  is  added.  In  certain  cases, 
the  sign  of  the  coefficient  changes,  although  the  magnitude 
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of  the  coefficient  remains  almost  the  same.  In  the  cases 
in  which  some  of  the  eigenvectors  reversed  signs,  the  error 
between  coefficients  is  large.  Even  for  these  cases,  the 
difference  in  the  absolute  values  of  the  coefficients 
remains  small.  This  is  demonstrated  in  Fig.  6-5,  in  which 
the  coefficient  4  differences  are  based  only  on  the  magnitude 
of  the  coefficients  from  the  control  and  test  methods.  Large 
errors  in  the  other  coefficients  are  similarly  reduced  when 
the  error  differences  are  between  absolute  values  of  the 
coefficients.  Once  the  eigenvectors  and  coefficients  are 
derived  from  the  dependent  set,  and  the  associated  regression 
equations  are  generated,  this  set  of  eigenvectors  must  be 
used  with  any  independent  cases.  Even  though  the  dependent 
set  may  be  quite  large,  the  addition  of  a  single  new  case 
will  introduce  the  possibility  of  a  sign  change  in  one  of 
the  eigenvectors,  and  a  reversal  in  sign  of  the  coefficients. 
This  would  invalidate  the  original  regression  equation  set, 
and  require  a  re-derivation  of  both  the  eigenvectors  and 
the  regression  equations  with  each  new  entry  into  the 
sample. 

The  reversal  in  sign  of  the  coefficients  and  vectors  is 
probably  due  to  computer  round-off  error.  Solution  of  a  120 
dimension  eigenvalue  problem  requires  simultaneous  solution 
of  120  homogeneous  equations — which  is  an  extremely  ill- 
conditioned  problem  (Gerald,  1977) .  The  probability  of 
catastrophic  round-off  error  increases  dramatically  as  the 


number  of  dimensions  increase.  However,  this  reversal 
problem  is  not  significant  in  the  study,  as  long  as  the 
coefficients  for  independent  cases  are  calculated  from 
dependent  eigenvectors. 

Further  attempts  to  isolate  the  conditions  under  which 
this  reversal  occurs  were  without  success.  Random  tests 
were  conducted  in  3,  5,  9  and  20  dimensions.  Not  until 
dimension  size  reached  20  were  the  first  reversals  noticed. 
The  fact  that  the  reversal  does  not  occur  until  higher 
dimension  systems  are  used  is  consistent  with  the  argument 
above,  because  the  greater  the  number  of  dimensions,  the 
greater  the  probability  for  catastrophic  round-off  error. 

Because  the  coefficients  calculated  by  the  two  methods 
have  consistent  magnitudes,  it  may  be  concluded  that  the 
coefficients  computed  for  independent  cases  using  the  same 
dependent  eigenvectors  will  introduce  very  little  error  to 
the  movement  forecast.  Thus,  implementation  of  these  EOF 
regression  forecasts  with  independent  cases  becomes  straight¬ 
forward.  Only  two  major  operations  are  required.  First, 
the  EOF  orthogonal  coefficients  from  the  dependent  set  of 
eigenvectors  are  stored.  This  involves  multiplication  of 
a  (10  X  120)  cranspose  matrix  of  truncated  eigenvectors  and 
the  (120  X  1)  normalized  observation  vector,  which  gives 
the  ten  coefficients.  The  second  step  involves  simple 
substitution  of  the  independent  coefficients  into  the 
regression  equations.  The  same  eigenvectors  and  eigenvalues 
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may 


be  used  indefinitely  on  independent  storms,  although  it 
is  recommended  the  regression  equations  be  updated  at  the 
conclusion  of  each  typhoon  season. 
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VII.  CONCLUSIONS  AND  FUTURE  APPLICATIONS 


It  has  been  shown  that  EOF  coefficients  correlate 
strongly  with  the  observed  motion.  Therefore,  use  of  EOF 
coefficients  to  represent  the  geopotential  patterns  in  the 
environment  of  a  tropical  cyclone  appears  to  be  a  valid 
approach  for  incorporation  of  synoptic  information  into  a 
statistically  based  forecast.  Incorporation  of  synoptic 
forcing  by  using  EOF  coefficients  appears  to  have  potential 
in  forecasting  tropical  storm  motion.  Using  an  independent 
sample,  an  average  of  17%  improvement  relative  to  JTNC 
official  motion  forecasts  was  obtained  using  the  500mb  EOF 
regression  equations.  The  use  of  500mb  equations  gave 
better  forecasts  than  either  the  70Gmb  or  850mb  equations. 

In  contrast.  Brown  (1981)  found  no  significant  difference 
in  forecast  ability  in  a  map-typing  forecast  technique  using 
the  same  three  atmospheric  levels.  Since  this  is  only  a 
pilot  study,  the  good  results  shown  here  need  to  be  tested 
further  with  new  data  cases.  Several  conclusions  and  future 
applications  are  drawn  from  this  study. 

(1)  The  regression  equations  were  developed  with  a  fairly 
small  dependent  data  sample,  and  yet  gave  good  results  when 
tested  with  an  independent  sample.  As  the  number  of  useable 
storm  cases  for  the  dependent  sample  increases,  the  regres¬ 
sion  equations  should  become  progressively  more  refined.  As 
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the  dependent  data  size  increases,  in  any  regression  scheme, 
more  extreme  cases  are  typically  forecast  better.  Large 
forecast  errors  should  occur  less  frequently  with  a  larger 
data  sample. 

(2)  This  method  of  incorporating  synoptic  fields  into  the 
regression  equations  is  not  limited  to  observed  fields.  It 
is  likely  that  coefficients  derived  from  a  24-hour  forecast 
field  (from  dynamic  numerical  weather  prediction  models) 
would  improve  the  long  range  forecast.  As  seen  in  the  study, 
the  accuracy  of  the  regression  equations  decreased  sharply 

in  time.  This  study  used  only  the  current  observed  field. 
After  24  to  36  hours,  it  is  expected  that  the  forcing  from 
the  mid-latitudes  would  be  significantly  different.  Use 
of  a  24  hour  prognosis  field  might  give  a  better  representa¬ 
tion  of  the  forcing  in  the  long-range  forecast. 

(3)  The  model  is  extremely  simple.  Using  only  values 
representing  the  synoptic  forcing  in  a  limited  grid  region 
about  the  3torm,  past  storm  movement  and  an  intensity 
measure  (which  proved  to  be  of  little  value) ,  the  forecasts 
appear  to  be  very  good.  If  variables  representing  other 
physical  features  thought  to  impact  storm  movement  are 
incorporated  into  the  regression  equations,  even  better 
forecasts  should  be  possible.  It  is  possible  that  the  phase 
of  equatorial  planetary  waves  near  the  storm,  and  other 
large  scale  circulation  features  may  play  a  role  in  tropical 
storm  movement.  These  waves  are  not  easily  detected. 
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Holton  (1972)  notes  that  these  waves  are  usually  only 
identifiable  in  the  stratosphere,  although  they  extend 
throughout  the  troposphere  and  stratosphere.  It  is  possible 
that  these  waves  could  be  identified  using  an  EOF  analysis 
of  the  global  band  in  the  tropics  at  a  mid-tropospheric 
level.  For  instance,  a  global  tropical  grid,  with  coverage 
to  about  30 °N  and  30 °S  may  be  adequate  to  identify  these 
waves  (which  would  probably  be  seen  in  the  first  5  to  10 
eigenvectors) .  These  EOF  coefficients  could  then  be 
incorporated  into  the  regression  equation.  A  global  grid 
could  also  possibly  detect  features  such  as  the  Walker 
circulation,  and  these  features  could  be  incorporated  into 
the  regression  forecast.  A  better  storm  intensity  than  the 
maximum  wind  used  in  this  study  needs  to  be  found.  Variables 

S 

such  as  the  radius  of  maximum  winds  should  be  tested  as  the 
data  become  available.  The  potential  predictors  that  could 
be  included  are  certainly  not  limited  to  those  mentioned 
above. 

(4)  The  model  was  developed  for  use  in  the  western  North 
Pacific  Ocean  genesis  basin,  although  the  method  could  be 
developed  for  other  genesis  regions.  The  only  difference 
in  the  different  regions  would  be  in  the  values  of  the 
regression  coefficients. 

(5)  Rotation  of  eigenvectors  could  also  be  tried  to 
improve  the  model.  If  this  were  to  be  done,  the  number  of 
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retained  vectors  would  have  to  be  larger,  to  prevent  against 
underfactoring. 

(5)  Application  of  the  EOF  scheme  in  its  present  form 
would  be  a  simple  matter.  In  fact,  if  the  regression 
equations  were  updated  only  once  a  year,  the  entire  forecast 
could  conceivably  be  obtained  on  a  hand-held  programmable 
calculator  with  sufficient  memory  to  store  the  mean  and 
standard  deviation  of  the  grid  points  and  all  eigenvectors. 
Entry  of  the  data  at  the  120  grid  points  is  all  that  would 
be  required  to  generate  the  movement  forecast.  The  grid 
point  data  might  be  obtained  using  a  Bessel  linear  inter¬ 
polation  from  the  63  X  63  FNOC  analysis.  Therefore,  the 
scheme  could  be  implemented  for  operational  use  with  a 
minimum  effort. 

« 

In  conclusion,  the  EOF  regression  scheme  shows  great 
promise  for  improvement  of  operational  forecasts  of  tropical 
storm  movement.  In  this  pilot  study,  using  a  very  simple 
model,  the  scheme  performed  very  well.  Potential  improvement 
is  possible  through  addition  of  more  sophisticated  physical 
forcing  parameters  and  forecast  dynamic  fields  that  may 
affect  storm  movement.  Further  research  in  this  area  is 
definitely  warranted. 
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APPENDIX  A 


700  AND  850MB  EIGENVECTORS 

The  first  10  eigenvectors  for  the  700  and  850mb  level 
follow.  These  are  the  vectors  used  in  deriving  the  coeffi 
cients  used  in  the  regression  equations. 


Al-1  except  for  eigenvector  8. 


Fig.  Al-19.  Similar  to  Fig.  Al-11  except  for  eigenvector  9. 


Fig.  Al-20.  Similar  to  Fig.  Al-11  except  for  eigenvector  10 
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TABLE  B  -  2 


Regression  coefficients  for  the  seven  zonal 
equations  using  700ab  EOF's.  A  value  of  .0. indicates 
t5e  predictor  was  not  selected  in  the  stepwise 
selection  procedure. 


FOB  EC AST  VALID  FOR  BASE  TIBS  PLUS  HOURS 
12  24  36  48  60  72  84 


Intercept 
Cofl 
Cof  2 
Cof  3 

Cof  6 

Cof  7 

Cof  8 

Cof  9 

CoflO 

Plat  1 

Plat2 

Plat3 

Plat4 

PlatS 

Plat6 

Plod 

Plon2 

Plon3 

Plona 

Plans 

Plono 

AewO 

Aaw  1 

Aaw2 

Aaw3 


28.246 

1.759 

.0 

-2.618 


52. 181 
4.857 
2.463 

-6.010 

4.774 

5.875 

.0 

.0 

-5.380 

.0 

.0 

.0 

.0 

.0 

:8 

-0.  261 

-1.486 
.  0 
.0 
.0 
.  o 
.0 
.0 
.0 
.0 

-0.400 


63.395 
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•10.074 

15.*  §89 
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:8 


:8 
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.  ) 

.0 


:8 


0 
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.0 
.0 

:l 
.  0 

.0 

.0 

.0 

.0 


206.607 

19.922 

12.687 

-16.974 

26  .*§00 
.0 
.0 

-21.056 

.0 

.0 

-1.000 
:8 
:8 
-2I3 


:8 


25 


*8 

.0 

•8 

.0 

•8 
•  0 
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333. 156 
23. ^70 

-Idt  865 
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-24.452 
.0 

-31.349 

.0 

.0 

.0 
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.0 
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*8 
.  0 

.0 

.0 

.0 

.0 

.0 

-2.259 

.0 


382.634 

24.606 

.0 

-29.982 

5 ll 8  12 

-28.619 

.0 

-50.614 

.0 

.0 

.0 

.0 

.0 
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.0 

.0 
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.0 

.0 

.0 

.0 

.0 

.0 

-2.206 

.0 
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TABU  B  -  3 


Regression  coefficients  for  the  seven  aeridional 
equations  using  SSOab  BOF's.  A  value  of  .0  indicates 
the  predictor  was  not  selected  in  the  stepwise 
selection  procedure. 


FORECAST  VALID  FOR  BASE  TIBS  PLUS  HOURS 


12 

24 

36 

48 

60 

72 

84 

Intercept 

26.555 

55.682 

77.256 

121.233 

211.106 

324.600 

207.533 

Cofl 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Cof  2 

1.  154 

3.641 

7.988 

1 1 . 98 1 

16.514 

11.960 

.0 

Cof3 

3.865 

9.081 

17.534 

19.859 

31.471 

13.913 

38.864 

Coftt 

.0 

.0 

.0 

.0 

.0 

33. 760 

.0 

Cof  5 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Cof  6 

-3.  117 

.0 

.0 

.0 

-24.926 

22.221 

.0 

Cof7 

Cof  8 

2.812 
•  0 

:8 

: 

:8 

:8 

41.016 
.  0 

Cof  9 

3.894 

9.170 

.0 

.0 

.0 

.0 

.0 

CoflO 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plonl 

.0 

.0 

-0.404 

-0.723 

.0 

.0 

.0 

Plon2 

.0 

.0 

.0 

.  0 

.0 

.0 

.0 

Plon3 

.0 

.0 

.0 

.0 

-0.358 

-0.764 

.0 

Plon4 

.0 

.0 

.0 

.0 

.0 

.0 

-1.470 

PlonS 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

PlonS 
Plat  1 

-0.114 

-0.  331 

.0 

.0 

.0 

.0 

.0 

-0.593 

-1.542 

-2.147 

-2.477 

-2.457 

.  0 

-2.992 

Plat2 

-0.089 

.0 

.0 

.0 

-2.409 

.  0 

Plat3 

.0 

.0 

.0 

.0 

.  0 

.0 

Plata 

.0 

.0 

.0 

.0 

.0 

.0 

PlatS 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Plat  6 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

AawO 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

Aawl 

•  0 

.0 

.0 

.0 

.0 

.0 

.0 

Aaw2 

.0 

.0 

.0 

.0 

.0 

.  0 

.0 

Aaw3 

-0.208 

-0.481 

-0.794 

-0.856 

-1.697 

-2.407 

.0 
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TABLE  B  -  4 


Regression  coefficients  for  the  seven  zonal 

850,b  EOP's.  a  value  of  .0  indicates 

sS!.g?!&cJ?§c^rS?t  seiect8d  in  *>•  »t,pSaiaicJt” 

FORECAST  VALID  FOR  BASE  TIB2  PLUS  BOOBS 


12 


24 


36 


48 


60 


72 


84 


Intercept 
Cofi 
Cof  2 
Cof  3 
Cof  4 
Cof  5 
Cof  6 
Cof? 

Cof  8 

Cof  9 

CoflO 

Plon  1 

Plor.2 

Plon3 

Plon4 

PlonS 

Plon6 

Plat  1 

PI  at  2 

Plat3 

?lat4 

Plat5 

Plate 

AavO 

Aavl 

Aav2 

Aav3 


29.935 

.0 

-2.286 

2.383 

.0 

1.886 

4.692 

.0 

.0 

4.569 

•8 

0.393 

-0 ! 272 
.0 
.0 
.0 
.0 
.0 
.0 
.0 
.0 
.0 
.0 
.0 
.0 


72.723 

.0 

-5.569 

4.675 

.0 

4.859 

11.561 

5.729 

.0 

7.3  27 

.0 

0.807 

.0 

:« 

.0 

.0 

.0 

0.  192 

.0 

-0.415 

.0 

.0 

.0 

.0 

.0 

.0 


92.290 

.0 

-9.653 

.0 

:8 

18.413 

9.021 

.0 

9.740 

.0 

1.011 

.0 

:8 

.0 

:8 

.0 

.0 

.0 

.0 

.0 

.0 

0.486 

.0 

.0 


158.753 
-7!  21 3 

":i92 

.  0 

17. 100 
9.773 
.  0 
.0 

ll?74 

.0 

:8 

.0 

:« 

.0 

.0 

.0 

.0 

.0 

.0 

.0 

:8 


210.892 

.0 

-9.184 

15.976 

.0 

.0 

23.592 

.0 

.0 

.0 

il  Isa 


190.344 

-15.096 
14.822 
.  0 
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APPENDIX  C 

30DIFIED  REGHESS ION  EQUATION  SSSULTS 


The  enclosed  table 
for  each  atmospheric  1 
These  equations  were 
the  10  coefficients , 
coao*red  with  Table  5 


qives  the  S2  statistic,  and  the  saaple  size 
evel,  for  the  aodified  regression  equations, 
derived  using  only  13  potential  predictors, 
Platl,  Plonl  and  Aawl.  The  values  aav  be 
-3  us:ng  the  entire  set  of  26  predictors. 


TABLE  C  -  1 


Saaple  size  and  R2  statistic  for  each  zonal  and  aeridional 
modified  regression  equation  by  forecast  tiae  and  atmospheric 
level. 


PORECAST  INTERVAL  (HR) 


12 

24 

36 

48 

60 

72 

84 

NUHBEB  OF 
DEPENDENT 
DATA  CASES 

409 

409 

387 

307 

281 

203 

184 

ZONAL  EQUATIONS 

500ab 

.777 

.714 

.67  2 

.594 

.549 

.519 

.457 

700ab 

.758 

.695 

.64  9 

.574 

.544 

.541 

.470 

350ab 

.738 

.676 

.614 

.536 

.497 

.503 

.456 

NEFIDIONAL 

EQUATIONS 

500ab 

.483 

.441 

.39  5 

.325 

.229 

.252 

.169 

700ab 

.455 

.435 

.378 

.315 

.223 

.202 

.145 

850ab 

.431 

.396 

.337 

.285 

.225 

.219 

.111 
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